Category Archives: bhyve

FreeBSD BHYVE using vm-bhyve to create FreeBSD, Linux, OpenBSD, NetBSD, Windows Server 2022 and Home Assistant VMs on NVME drives with UEFI

November 27, 2023bhyve, freebsd, Virtual MachinesSunSaturn

Easily create machines with FreeBSD and vm-bhyve. If you come from a world of using virsh with Linux, going this route will save you some massive headaches. You won’t have to reinstall your OS every new release as FreeBSD provides freebsd-update utility for that, as well as you won’t have to deal with anymore laggy virt-manager remote sessions, and easily be able to connect to console just as with virsh. Yeah let’s leave the Linux redhat/centos/rocky blah war behind us, and use something that will be solid and powerful on bare metal for a decade to come….

For this setup, what I have is 2 10gigabit ports, so I’ve already gone and created my LACP interface in /etc/rc.conf, then I created a bridge on top of that for the VMs. In Linux world that would be your bond0 interface and your br0 interface on top of that.

In FreeBSD its as simple as:

cloned_interfaces="lagg0 bridge0"
ifconfig_br0="addm lagg0 up description services"
ifconfig_bridge0_name="br0"
ifconfig_br0_alias0="inet 192.168.0.1 netmask 255.255.255.0 mtu 9000"
ifconfig_br0_ipv6="inet6 fc00:192:168:1::1/64 accept_rtadv auto_linklocal"
ifconfig_lagg0="laggproto lacp laggport ix0 laggport ix1"

ifconfig_ix0="up"
ifconfig_ix1="up"

What if you have a 100 gigabit home lab? I’ll just take a second to drool, then basically say that would be same setup as above but you’d most likely be setting your MTU to 16k instead of 9k in your FreeBSD rc.conf and on your switch ports configured for LACP.

Or without 10 gigabit LACP following should work, use your own device name for ix0:

cloned_interfaces="bridge0"
ifconfig_br0="addm ix0 up description services"
ifconfig_bridge0_name="br0"
ifconfig_br0_alias0="inet 192.168.0.1 netmask 255.255.255.0"
ifconfig_br0_ipv6="inet6 fc00:192:168:1::1/64 accept_rtadv auto_linklocal"
ifconfig_ix0="up"

Now we have a simple br0 interface for our VMs. If you not using LACP just remove the lagg0 stuff and addm your real interface instead to ifconfig_br0 line, the main point is we have our br0 device ready to go, don’t use MTU 9000 if you haven’t upgraded to 10 gigabit in your home lab 🙂 Using samba with this setup I get full 1 GB/s each way off NVME’s, and using SMB server multi channel I’ve gotten to around 1.3 GB/s each way to my Windows 11 desktop. Can’t complain it’s an older Dell r720 using clover to boot 3 NVMEs, if I had a newer server I’d definitely look into RDMA.

Next let’s make sure we have sysctl variables setup to make sure we can work with bhyve, so let’s open /etc/sysctl.conf and add the following:

#BHYVE
net.link.tap.up_on_open=1
#BYHVE + PF nat
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1

Now enter: “service sysctl restart” on command line to apply the changes.

Now assuming your using your NVME drives for VMs, personally I used 3 1TB NVME drives in a stripe on zroot. Even though it is a raid0 it doesn’t matter, I have another server I bring online occasionally to back this one up, think of it as a raid 10 just across 2 machines 🙂

Alright let’s install what we need:

pkg install grub2-bhyve bhyve-firmware tmux vm-bhyve
zfs create -o mountpoint=/vm zroot/vm

Now we will want to add VMs to startup automatically when we reboot so let’s add vm-bhyve to /etc/rc.conf:

vm_enable="YES"
vm_dir="zfs:zroot/vm"

#vm_list="vm1 vm2"
#vm_delay="5"

We will leave last 2 lines commented out so we can decide what to start after install 🙂 Now let’s get vm-bhyve setup:

vm init
cp /usr/local/share/examples/vm-bhyve/* /vm/.templates/
vm switch create -t manual -b br0 services
vm set console=tmux

Next thing we want to do is edit our default template for creating new VMs. Personally i use pico which is an alias to “nano -w”. So “cd /vm/.templates; pico default.conf”

loader="uefi"
graphics="yes"
graphics_listen="192.168.0.1"
graphics_port="5900"
#If you want it to wait connecting with VNC uncomment next line
#graphics_wait="yes"
#Valid Options: 1920x1200,1920x1080,1600x1200,1600x900,1280x1024,1280x720,1024x768,800x600,640x480
#graphics_res="1920x1080"
xhci_mouse="yes"

#OPENBSD - uncomment hostbridge line and use .img not .iso from download site
#https://wiki.freebsd.org/bhyve/OpenBSD - vm install openbsd install74.img
#hostbridge="amd"

#conservative 1 cpu socket for windows, they charge apparently for multiple sockets
cpu=4
cpu_sockets=1 
cpu_cores=4 
memory=4G

#Use e1000 for NetBSD or you can't set mtu to 9000
#network0_type="e1000"
network0_type="virtio-net"

#assign tap devices to manual switch we created (vm switch create -t manual -b br0 services)
network0_switch="services"
disk0_type="nvme"
disk0_name="disk0.img"

#windows virtio driver to enable virtio-net network drivers
#https://github.com/virtio-win/virtio-win-pkg-scripts/blob/master/README.md
#vm iso https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso
#NetKVM->2k22->amd64 (for virtio ethernet driver, use device manager to update driver to this directory)
#uncomment for windows
#disk1_type="ahci-cd"
#disk1_dev="custom"
#disk1_name="/vm/.iso/virtio-win.iso"

#windows expects the host to expose localtime by default, not UTC
#uncomment for windows
#utctime="no"

This gives me a generic template for creating VM’s with windows, Linux, FreeBSD, OpenBSD etc on my NVME drives.

So when we now create VM’s it will install this template to /vm/$VM_NAME directory along with 2 lines adding unique mac address and UUID. That’s important because we don’t want crazy things happening in our ARP tables 🙂 Mainly what is nice about that is for DHCPD. If you want to control what IP address your VM gets you can add that mac address to your /usr/local/etc/dhcpd.conf file and know what IP it will get if not hardcoding it. UUID is for dhcpd6.conf where you can try to hardcode IPV6 address for host using that. Since all OS’s don’t really respect it, I normally leave it alone.

May be wondering why I use raw images instead of zvols? Because raw images have better performance read here: https://klarasystems.com/articles/virtualization-showdown-freebsd-bhyve-linux-kvm/

No worries, I’ve Crystal Mark tested a windows VM, and it would definitely blow away Linux KVM on disk speed.

So at this point we are done our setup, now on to creating VM’s.

So let’s take an example, I want to create a VM called asterisk, another FreeBSD 14 machine here we go:

vm iso http://blah.iso
vm create -s 50G asterisk
#add this new VM IP address to /etc/hosts
pico /etc/hosts 
cd /vm/asterisk
cat asterisk.conf
#look for mac address and add to DHCPD
pico /usr/local/etc/dhcpd.conf
service isc-dhcpd restart
pico asterisk.conf

So what we doing here is telling vm-bhyve to fetch iso for us and put it in /vm/.iso. Then we tell it to create a 50G image called disk0.img in /vm/asterisk directory, an asterisk.conf from our /vm/.templates/default.conf file with 2 unique lines added for mac address and UUID. Pretty sweet. Don’t over think it here with disk0.img, all its doing is a simple “truncate -s +50G disk0.img” to create a blank image.

I then go on to add IP address I picked out to /etc/hosts and /usr/local/etc/dhcpd.conf but you can skip this if you like, but it’s a useful habit to get into for installing things like say home assistant VMs.

Now final step is edit asterisk.conf to your liking. Normally only thing I change per VM is graphics port and I may use virtio CDROM for windows. I’ve successfully installed FreeBSD, Ubuntu, Rocky Linux, OpenBSD and windows server 2022 using this template so it’s a pretty good one 🙂 Also since we using NVME makes windows install less painful as we won’t need any windows drivers for hard drive, think you might for network card still, why I generally leave cdrom attached in windows. For graphics_port I normally set that to last digits of VMs IP address, so like say I set asterisk to use 192.168.0.3, then my graphics_port will be 5903. I then add that host to my SecureCRT list, and add the VNC to my mobaxterm list 🙂

Honestly it’s more steps, but at end of day when VM’s are all running, now its one simple click on ssh session list or VNC session list to get to VM, so nice habit to get into.

Now let’s start the install. Normally when you first start trying to install an OS you want CDROM or USB stick in to install it, then remove it as host reboots. vm-bhyve does something amazing here, does it for you 🙂

vm install asterisk FreeBSD-14.0-RELEASE-amd64-bootonly.iso

That’s it! It will take ISO we downloaded to /vm/.iso/blah.ISO , add it to VM for install. At this point what I normally do is either “vm list; vm console asterisk” or just use mobaxterm to connect to graphics port. Tmux is nice addition to serial port of host 🙂 But normally to be able to use that you have to configure each host to use the serial port. Like adding a line at end of grub on Linux or just editing FreeBSD /boot/loader.conf.

If you want to wipe a VM out and start over you have 2 options:

vm destroy asterisk
OR
zfs destroy -r zroot/vm/asterisk
#-r because we'd have to killoff snapshots to if they exist

Your all done. Congratulations.

Now what I normally like to do with a new VM is make sure it shuts down from host when I want it to, so I may issue a “vm stop asterisk” and keep watching VM and “vm list” to see it has ended. When I’m happy, I’ll “vm start asterisk”, then add asterisk to /etc/rc.conf as something to start on boot for example.

What if you screwed up install and say installed asterisk here with 10G less that you wanted? Easy enough, on the host just do the following:

vm stop asterisk
cd /vm/asterisk
truncate -s +10G disk0.img
mdconfig disk0.img
gpart recover md0      #recover disk after adding space
gpart show md0         #find partition number with freebsd-zfs in it, in my case 3 
gpart resize -i3 md0   #expand partition freebsd-zfs partition 3, could also be 4 if you didn't install with GPT(EFI)       
mdconfig -du md0       #unmount md0

But how do you get the guest to see the new space? Do following:

vm start asterisk
#login to guest and:
zpool status (look for device)
zpool online -e zroot /dev/nda0p3 #use your device from command above

That’s it. But what if you screwed up something on guest in /boot/loader.conf or /etc/rc.conf that prevents you from booting? Simple let’s mount it and fix it on host:

vm stop asterisk #make sure guest is stopped or risk data corruption!
cd /vm/asterisk
mdconfig disk0.img
mkdir mnt
zpool import      #you should see md0 zroot disk to import
zpool import -fR /vm/asterisk/mnt -t zroot testing
zfs mount -o mountpoint=/vm/asterisk/mnt testing/ROOT/default
#cd /mnt/etc or /mnt/boot etc and fix stuff....
zpool export testing
rm -rf mnt
mdconfig -du md0  #unmount guest

That’s it, then just “vm start asterisk” and problem solved 🙂 Even simpler for Linux obviously, just look for partition and mount it, but same would apply for a debian zfs root install.

Now I bet someone is wondering what if that was an ubuntu install, and I screwed up grub, how would I recover that? Well simple, “vm poweroff ubuntu”. “vm install ubuntu ubuntu-22.04.3-live-server-amd64.iso” to load the install ISO just so we can use it to drop to a shell. Connect with VNC viewer, go to help -> shell. Now assuming you installed with all default options with LVM and EFI then here is how to fix grub:

fdisk -l
lvscan
mount /dev/ubuntu-vg/ubuntu-lv /mnt
mount /dev/nvme0n1p2 /mnt/boot
mount /dev/nvme0n1p1 /mnt/boot/efi

mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys

chroot /mnt
nano /etc/default/grub  (fix it)
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=GR
update-grub

umount /mnt/dev
umount /mnt/proc
umount /mnt/sys
umount /mnt/boot/efi
umount /mnt/boot    
umount /mnt
sync

reboot

Again all things you can do right from FreeBSD host. And again someone will probably ask I tried to install Kali Linux and after install it won’t boot how do i fix it? Well I thought this guy did a great job of explaining it:

https://record99.blogspot.com/2021/12/bdsdex-failed-to-load-boot0001-uefi-bhyve-sata-disk.html

So let’s just modify that for Kali Linux:

1. Enter exit get out of the UEFI Shell (you will come to BHYVE panel)
2. Choose Boot Maintenance Manager
3. (In the Boot Maintenace Manager ) Choose Boot From File
4. (In File Explorer) NO VOLUME LABEL, [PciRoot(0x0)/ Pci()]  press Enter
5. (Still in the File Explorer) Choose <EFI>
6. (Still in the File Explorer) Choose debian 
7. (Still in the File Explorer) Choose grubx64.efi
8. Then you will get into the system
How can you fix it:
1) Get access inside your Kali Guest VM.
2) as root: cd /boot/efi/EFI
3) mkdir BOOT
4) cp kali/grubx64.efi BOOT/bootx64.efi
5) reboot
Basically you need to have this file in this path:
/boot/efi/EFI/BOOT/bootx64.efi

And now someone is going to ask how did I get windows server 2022 to work. I downloaded the windows_server_2022.iso and virtio-win.iso to my /vm/.iso directory and edited my template as follows:

loader="uefi"
graphics="yes"
graphics_listen="192.168.0.1"
graphics_port="5907"
#enable this briefly on windows installs, install won't do anything till you connect with VNC
#graphics_wait="yes"
#Valid Options: 1920x1200,1920x1080,1600x1200,1600x900,1280x1024,1280x720,1024x768,800x600,640x480
graphics_res="1920x1080"
xhci_mouse="yes"
#conservative 1 cpu socket for windows, they charge apparently for multiple sockets
cpu=4
cpu_sockets=1 
cpu_cores=4 
memory=4G
network0_type="virtio-net"
#assign tap devices to manual switch we created (vm switch create -t manual -b br0 services)
network0_switch="services"
disk0_type="nvme"
disk0_name="disk0.img"

#windows virtio driver to enable virtio-net network drivers
#uncomment for windows
disk1_type="ahci-cd"
disk1_dev="custom"
disk1_name="/vm/.iso/virtio-win.iso"

# windows expects the host to expose localtime by default, not UTC
#uncomment for windows
utctime="no"
uuid="6efefb12-8da5-11ee-906d-e4434bf65f00"
network0_mac="58:9c:fc:0f:b8:a6"

Then I did a simple “vm install windows windows_server_2022.iso” and installed it. When it was installed I went to device manager, selected the ethernet adapter that obviously wouldn’t work, then pointed it to look on the virtio-win.iso drive at “NetKVM->2k22->amd64”. About it. Thought I’d throw this in there as I’m sure there is somebody that likes to setup active directory to authenticate everything. But honestly look at passkeys, it’s the new thing apparently, might as well learn it now, search for webauthn.

Ok we are now experts, so let’s install Home Assistant VM then to make sure 🙂 So home assistant only provides an image and not a ISO, so that’s pretty simple to, all we really need is a raw image, so I see they provide a qcow2 image on their site, so we can just convert that easily after downloading it, and for good measure let’s break root on it to after install 🙂

vm create -s 10G ha
vm iso https://github.com/home-assistant/operating-system/releases/download/11.1/haos_ova-11.1.qcow2.xz
cd /vm/.iso
unxz haos_ova-11.1.qcow2.xz
pkg install qemu-tools gdisk
qemu-img convert -f qcow2 -O raw haos_ova-11.1.qcow2 haos_ova-11.1.img
rm haos_ova-11.1.qcow2
gdisk -l ./haos_ova-11.1.img
cp /vm/.iso/haos_ova-11.1.img /vm/ha/disk0.img

Ok what I’ve done here is download the image, go to ISO/image directory and uncompress it. I install some tools to convert image to raw image type we need for BHYVE. Delete old format to save disk space, then run gdisk on image to see all its partitions to know we have a working OS on this image. Could have used gpart I suppose as well but nice to have a command that works on FreeBSD and Linux.

I then just copy image over the one vm-bhyve added, so I could have even done 1G instead of 10G creation or whatever, it didn’t matter.

Alright, remember I said have good habits with VMs, at this point cd /vm/ha and look at ha.conf for any edits, and set an IP for this home assistant VM in /etc/hosts and toss it’s IP and mac address in /usr/local/etc/dhcpd.conf. “service isc-dhcpd restart”. I assigned it 192.168.0.15, so I’m pretty sure when it boots its going to call on a DHCPD server for an IP.

vm start ha
vm console ha

Well I typed in “root” and hit enter and I have root access, so much for trying to mount all the partitions and break it 🙂 Also went to VNC port and at ha> prompt just typed in “login” and got root to. Doing a cat /etc/shadow it seems root has no password to login. Running a “ps aux” I see dropbear running instead of sshd, like an android system, yet this is a x86_64 image, unreal.

Alright well this is still inconvenient, rather add it to my ssh list to with root access, since they just let you blindly login with root on console like this, they probably have devel docs somewhere how to do this.

YEP, check it out here: https://developers.home-assistant.io/docs/operating-system/debugging/

Just says add your pub ssh key and your good to go with ssh root@homeassistant.local -p 22222. Fair enough let’s go back to our tmux console and add it then:

cd /root/.ssh
vi authorized_keys
chmod 600 authorized_keys

Well they didn’t have nano installed, but most OS’s include vi, so was as easy as just opening a file, making sure its permissions was ok and good to go, can add it to your ssh session list now 🙂

I must admit I now played with it for a bit, and quite like this, added all my smart devices to it, downloaded the app to my phone, guess all I need is a microphone/speaker so can control all my lights etc if internet goes out 🙂

Anyways, this really is how simple using vm-bhyve is, vm list, vm iso/create/stop/start etc. I honestly like it, and will continue to use it, reminds me of using virsh with libvirtools on Linux. Tmux console is great, CTRL-a-s to switch between VM’s quite handy. I was only used to using screen before tmux, so I remapped CTRL-a-d to work on tmux to:

pico /usr/local/etc/tmux.conf
#and added:
# remap prefix to Control + a
set -g prefix C-a
# bind 'C-a C-a' to type 'C-a'
bind C-a send-prefix
unbind C-b

Great works like screen, now I won’t forget 🙂 tmux is pretty good, liking it better these days than screen, but I still keep screen and minicom around because tmux can’t connect ports with baud rates, parity etc.

Anyways I hope this has been a great introduction to working with VMs under FreeBSD BHYVE with vm-bhyve, vm-bhyve adds some powerful features and makes it easy for us. FreeBSD gives us the power of its ports collection, most stable ZFS OS, and powerful networking features like /etc/pf.conf and working with setfibs(its like rt_tables on linux but more customizable) 🙂

Now go setup your zfs snapshots for your VMs in crontab: “crontab -e”

0 3 * * * (zfs destroy -r zroot/vm@`/bin/date +\%A`; zfs snapshot -r zroot/vm@`/bin/date +\%A`) > /dev/null 2>&1
0 3 1 * * (zfs destroy -r zroot/vm@`/bin/date +\%B`; zfs snapshot -r zroot/vm@`/bin/date +\%B`) > /dev/null 2>&1

What this does is first line takes a snapshot daily, Mon-Sunday, so during a 7 day period you can rollback, and it will delete old ones to save disk space. The “-r” just says do it for each VM. I really only had to escape the “%” sign for cron about it. Second line does a monthly snapshot, so you’ll get 12 over course of a year, and they’ll delete each other to save disk space and keep things organized, simple and sweet…

If you need to rollback a VM I suggest you setup an alias to see your snapshots in your .bash_profile or whatever shell you use:

alias snapshot='zfs list -t snapshot'

Then you can just rollback to whatever you see in list:

snapshot
zfs rollback zroot/vm/asterisk@Monday

That’s the power of ZFS on a virtual machine 🙂 Now go add your new VM to your vm list in /etc/rc.conf if you want it to startup when host reboots. Good to go, now can rollback a VM if you screw it up 🙂

And now someone is going to ask how to I get vm console serial access to all OS’s because you’d don’t like using VNC viewer. Well first of all for windows obviously your going to have to use VNC viewer, for the rest, let’s start with FreeBSD, add to your /boot/loader.conf

autoboot_delay="5"
comconsole_speed="115200"
boot_multicons="YES"
boot_serial="YES"
console="efi"

Then just reboot, and use vm console to watch it. If you have resizing issues with tmux just type “resize” and you’ll be fine, if you don’t have the command install xterm on guest. Should be installing xterm and xauth on every guest. Now what about ubuntu/kali? Edit /etc/default/grub as such:

-GRUB_TIMEOUT_STYLE=hidden
+#GRUB_TIMEOUT_STYLE=hidden

# Optional kernel options that you very likely want. Don't affect GRUB itself.
# Remove quiet to show the boot logs on terminal. Otherwise, you see just init onwards.
# Add console=ttyS0, or else kernel does not output anything to terminal.
-GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
-GRUB_CMDLINE_LINUX=""
+GRUB_CMDLINE_LINUX_DEFAULT=""
+GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0,115200"

# Show grub on both serial and on display.
-#GRUB_TERMINAL=console
+GRUB_TERMINAL="console serial"

Then just type “update-grub” and your good to go.

What about centos/redhat/rocky linux, you installed the newest 9.3 as of today? Well since all that is pretty much redhat look at following url:

https://access.redhat.com/articles/3166931#config9

So basically just do:

grubby --update-kernel=ALL --args="console=tty0 console=ttyS0,115200"
grubby --info=ALL|grep -i args

What about OpenBSD?

openbsd:~ # cat /etc/boot.conf 
stty com0 115200
set tty com0
openbsd:~ #

What about NetBSD?

echo "consdev=com0,115200" >> /boot.cfg

Other helpful hints? Just remember the “resize” command when working with vm console VMNAME. Tmux is great at screwing up your terminal. Other than that, I have this added to my .bash_profile to remind me to make sure to setup X11 forwarding and have that resize command work:

if ! command -v xterm >/dev/null 2>&1
then
    echo "WARNING: xterm not installed, resize command may not exist for tmux!"
fi
if ! command -v xauth >/dev/null 2>&1
then
    echo "WARNING: xauth not installed, X11 forwarding will not work, enable it in sshd.conf as well"
fi

Where to go from here? If its just a home lab maybe you should authenticate a user to login to every VM in one spot, maybe active directory, maybe kerberos, or just keep it simple and use NFS and NIS. I’ve tested NFS and NIS across all OS’s except windows obviously, only issue is with Redhat/Rocky/Centos having removed all NIS stuff calling it insecure.

Well that’s true and not true, if someone was sitting on my network with tcpdump or wireshark, for sure it’s possible to eventually sniff out encrypted hash of password. Personally I blame the old /etc/shadow /etc/master.passwd format. If they would just upgrade that to using public and private keys they could sniff all they wanted and take 10 years trying to crack it then. It is what it is, but for a home lab, NIS is just fine, at least you can use yppasswd to update your password across all VMs and your good to go…

Other fun things to try with FreeBSD as host? Get yourself a Starlink dish on sale, install it, and using /etc/pf.conf and setfibs make certain IPs in your home lab go out Starlink and others your default internet connection. After free month trial cancel it. Then if your internet ever goes down you have a layer of redundancy to just enable Starlink on your phone 🙂

Or play with wireguard, then using /etc/pf.conf and setfibs, make certain IPs in your home lab go out VPN, and others your default connection. Maybe start a VPN service afterwards.

Play with samba and NFS, export them to your VLC app on your google TV chromecast in living room. Configure utorrent to download to the samba directory that you have mapped as network drive on windows.

Play with snapshots, take snapshots of everything so you can rollback. You have power of ZFS on main virtualization host, take advantage of it 🙂

Install a rsyncd.conf on every VM, using crontab from host backup important configuration files on each VM once a night.

Install a windows server 2022 VM that you configure to start first in /etc/rc.conf. Watch some youtube videos on active directory, configure some VMs to authenticate off that for fun. Maybe play with passkeys instead.

Write some HOWTO’s on internet how to increase VM performance even more, I’d love to read it!

Setup a real dedicated server online, use same setup here. Install VMs for a webserver, DNS server, mail server etc. Try to run as few services on main host as you can other than DNS , wireguard and ssh. Use one central /etc/pf.conf to firewall for host and every VM(be very careful unless your provider can give you console access to machine through a KVM etc, take it from experience 🙂 ) If not place firewall rules and revert them in 60 seconds something like: “pfctl -F all; pfctl -f /etc/pf.conf; sleep 60; pfctl -F all; pfctl -f /etc/pf.conf.old”. I’d really recommend testing on your home lab first 🙂 Try to install all services you can on a FreeBSD vm, for anything else can use an ubuntu VM for development. Setup a wireguard tunnel to back this server up once a night using rsync or zfs send/receive. For advanced users maybe rotate wireguard keys once in awhile automated. The habit you want to get into is you are rarely logging into main host for anything other than security updates, your mostly using the VM’s to do everything. That way if you screw anything up you can simply do a “vm console VM” on host. Let the worker bees be children, and the parent parenting 🙂

Maybe you’d like some custom FreeBSD kernel tuning, you can recompile the whole world and add features as you like. I’ve done this on numerous occasions to do custom things. I should note do not do this on production servers online unless you know what you are doing, cause it can take you that much longer to apply a security update, I almost feel sorry for NetFlix in that regard as they run all custom FreeBSD current custom servers(but not that sorry, the hardware they run on is very powerful so would take them almost no time 🙂 ) If you decide to do this compile everything on main host, nfs mount /usr/src and /usr/obj on all FreeBSD VMs and do a quick install on those to.

Maybe play with crypto, maybe learn GPU passthrough on vm-bhyve, possibilities are endless, you are now in complete control….

Happy BHYVE’ing…..

SunSaturn

FreeBSD BHYVE why use it?A Bhyve suspend/resume howto + disaster recovery with UFS and ZFS

November 11, 2022bhyve, freebsd, resume, suspendSunSaturn

INTRO:

FreeBSD Bhyve virtualization why? Sure there are different virtualization packages out there, but you have FreeBSD! Why not use ESXI? From experience any closed source software generally has a max lifetime of 10 years(abandonware), and opensource will generally over throw them by then. Also anything you don’t have full control over on root prompt to read the source code will be a nightmare to administer.

What about Linux and KVM? Yes, before Bhyve they were the best, the virt-manager and virsh/virt-viewer commands were unbeatable. Here is the thing with Redhat however, they over the course of just 2 years now have made most of sys admins angry on the internet. First they removed the megaraid drivers from the kernel(yes people still run Dell r710’s etc as fileservers), which caused a lot of people grief, then on upgrade to Redhat 9, they killed off a lot of Dell servers as well, people who upgraded had to reinstall with Rocky Linux 8 as they used a glibc that did not work on a lot of people’s systems. And to top it off they killed off Centos by moving it to Centos-stream, angering rest of internet. So everyone moved to Rocky Linux(original creator of Centos), to keep their hosts “production”.

The worst thing about Linux to begin with as a virtualization host, is every 2 years when a new major release comes out, most people have to take time out of their day to reinstall the OS. A core OS! With FreeBSD you can update to new major releases by way of simple freebsd-update command or rebuilding source code from scratch with “make buildworld” as well. No downtime reinstalling, no host to replace, a sys admin dream.

I’ve had fights with programmers over the years over Linux and FreeBSD, especially when it came to libevent and libev, where people swore by epoll over kqueue, from experience put FreeBSD under load and then a Linux server, your loads will be better served on FreeBSD with less CPU utilization, the network stack is better, more efficient, even Netflix uses it.

And now I know what everyone is going to say, been using Linux/KVM virt-manager for so many years, FreeBSD doesn’t even have suspend/resume stable. I’m not loosing my 20 SSH connections to my guests just because I had to reboot for a kernel update. I can’t disagree with this, I’d hate to be working 4 hours on a script on one of them, reboot for a kernel update, and lost all my work because I forgot to save on a host and host didn’t resume guests properly after reboot.

So why use Linux at all? I think Linux is fine as a guest, just not in the frontline of battle for reasons I mentioned above. There are still lots of things that will only run on Linux, if your a flutter programmer, FreeBSD still has not ported Dart language, so your apps will still need a Linux guest to test builds. If I’m considering working on an app, I would most likely install an ubuntu guest for that, do all your dart/c/java there along with windows 11, and your python Fastapi’s interfacing with Mysql on Linux and/or FreeBSD.

Honestly for your server part of your app, I would use a FreeBSD guest to begin with hosting Fastapi, I will show you how you could pass a second disk through to the guest with a proper 16k blocksize for max Mysql performance for your app. The biggest reason to use FreeBSD upfront, you can be stuck 6 months to a year working on an app, you really need a week of downtime because some new Linux release came out and you have to reinstall? I didn’t think so…let FreeBSD “Shine bright like a diamond” as Rihanna would sing.

So today I am going to take what is the most important and exciting #1 feature release for FreeBSD 14, suspend and resume. I will be installing FreeBSD current to test it. I will also show how FreeBSD is a force to be reckoned with as the core OS with virtualization.

Test case scenario:

Dell r720, Intel Enterprise 480GB SSD as core OS slapped into an icy dock for core OS, along with 2 12TB rust spinning drives for separate ZFS backup pool all on a megaraid JBOB controller. FreeBSD current, with production flags, with experimental suspend and resume features enabled in kernel and userland.

For main OS we will use default ZFS install, for guest tests we will install 2 FreeBSD guests, one with ZFS, and one with UFS. We will suspend them both, then resume them and see if we loose our SSH connections 🙂 I will most likely do a part 2 on installing Linux and Windows guests as I see this article will get pretty lengthy just with these 2 guests…

For third-party packages to interface with suspend and resume, we can’t use any. We will have to do everything manually, I will keep it simple with just bash scripts. Maybe I’ll code an advanced interface with python and asyncio in future. One third party package I did run across I kind of liked was vm-bhyve. To be honest only thing I liked about it was his directory layout, so that’s only thing I will try to keep consistent for our scenario.

Bhyve still needs a good interface to it, I think what would be best for it is python fastapi or C/Rust with kqueue as server end. Front end hands down flutter, will build on Linux, apache/nginx, android, IOS, windows and even our TVs in livingroom for a front end interface. Even on FreeBSD natively once Dart is ported to it. I wouldn’t even bother with Java or Kotlin for these reasons alone. I think if that was built for a year would turn FreeBSD into everyone’s favorite virtualization OS.

Back to sys admin stuff, let’s set this up once really well and we should be good for next 5-10 years, just clone the drive if moving to PCIE 4 or 5 machine this is not Linux 🙂

Setup Bhyve:

Let’s start with something simple, setup our directory structure like vm-bhyve, mod what we need and uninstall his package. I’m going to assume at this point you just have a FreeBSD release like FreeBSD 13.1 installed, so let’s begin….

https://github.com/churchers/vm-bhyve

pkg install vm-bhyve bhyve-firmware
zfs create -o mountpoint=/vm zroot/vm
sysrc vm_enable="YES"
sysrc vm_dir="zfs:pool/vm"
vm init
cp /usr/local/share/examples/vm-bhyve/* /vm/.templates/

OK great now we have a directory structure to work with, he likes to keep all guest related things in /vm/<guestname>, so let’s keep with his directory structure idea and delete the package now.

pkg delete vm-bhyve
nano /etc/rc.conf
(now edit /etc/rc.conf and remove/comment out his sysrc lines)
(while we in here let's add support for 4 guest tap interfaces)

#support for 4 test guests - substitute "ix0" for your own interface
cloned_interfaces="bridge0 tap0 tap1 tap2 tap3"
ifconfig_bridge0_name="br0"
ifconfig_br0="addm ix0 addm tap0 addm tap1 addm tap2 addm tap3"
(exit /etc/rc.conf)
#now let's just create what that does manually on command line for now
ifconfig bridge create
ifconfig tap0 create
ifconfig tap1 create
ifconfig tap2 create
ifconfig tap3 create
ifconfig bridge0 addm ix0 addm tap0 addm tap1 addm tap2 addm tap3
ifconfig bridge0 name br0
ifconfig br0 up
#damn starting to like /etc/rc.conf better already :)
(alright now let's edit /etc/sysctl.conf)
nano /etc/sysctl.conf (add following:)
#BHYVE
net.link.tap.up_on_open=1
#BYHVE + PF nat
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1
(exit /etc/sysctl.conf)
nano /boot/loader.conf
(have it look like this:)
autoboot_delay="5"
kernels="kernel kernel.old"
boot_serial="YES"
#stuff added by install
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
#bhyve
vmm_load="YES"
nmdm_load="YES"
if_bridge_load="YES"
if_tap_load="YES"
(exit /boot/loader.conf)

At this point we should be able to do an “ifconfig” and see our bridge setup properly for 4 possible guests, just add more tap interfaces for additional guests. Now let’s reboot and make sure our bridge etc is still there:

shutdown -r now
ifconfig -a
#let's install a few packages to help us
pkg install screen tightvnc git bash

Moving to FreeBSD current with experimental suspend/resume support(MOVIE TIME!):

Ok the time consuming part, hope you have a lot of cores/fast CPUs 🙂

Alright even if you have current installed you still will not have support enabled so we are going to recompile kernel and userland for support. Honestly this is way to upgrade FreeBSD from source anytime, only difference on stable/release is I would checkout a different branch with git and be using GENERIC instead of GENERIC-NODEBUG to copy from. You may be thinking I’ll checkout a release branch and just compile in support, tried that, it won’t work properly, this is your only way to test it before 14 release where it actually works for most part.

The make buildworld, make buildkernel, make installworld commands will take a long time, I recommend going to play your favorite video game or watch a movie after executing those commands, return in a few hours to check in on them time to time. Honestly once you have the buildworld out of the way its not that bad after. I’d recommend running everything in a screen session as well, that way if something happens you can log back in and do a “screen -r”. After your done, return tomorrow and we will continue on with testing suspend and resume 🙂

screen
git clone https://git.FreeBSD.org/src.git /usr/src
cd /usr/src/sys/amd64/conf
(edit MYKERNEL) cp GENERIC-NODEBUG MYKERNEL (add: options         BHYVE_SNAPSHOT)
cd /usr/src
#(find amount of CPUs and adjust -j below - "dmesg|grep SMP")
make -j12 buildworld -DWITH_BHYVE_SNAPSHOT -DWITH_MALLOC_PRODUCTION
make -j12 buildkernel KERNCONF=MYKERNEL
make installkernel KERNCONF=MYKERNEL
shutdown -r now
cd /usr/src; make installworld
shutdown -r now
etcupdate -B
pkg bootstrap -f #if new freebsd version
pkg upgrade -f   #if new freebsd version
shutdown -r now

Preliminary Thoughts on Creating Guests:

We are back and ready to roll! So creating guests is probably something every sys admin sits there and thinks hours upon hours about. What performance improvements could I make, will I be able to mount /etc and /boot directory of an offline guest if anything goes wrong or you screw anything up on guest. How will you do backups for them, how will you recover from disaster if it strikes. These guests once setup can run 5-10 years, no room for error, everything has to be accounted for.

I will give you my take on this from decades of experience. When you are starting out fresh, your main concern is the main virtualization host is doing nothing but virtualization and routing. You’d don’t want apache, email servers or any type of server processes running on it that are better suited for guests that can crash if need be and not affect the main host and other guests running on it. You want the main host rock stable at all times. You can play with the guests and give them the memory and CPUs they require to run what they need.

On the main FreeBSD host – virtualization, DNS, isc-DHCP/isc-DHCPD6, RADVD, PF FIREWALL, and at most backups/ZFS snapshots. Over the years before FreeBSD had Bhyve, I offloaded all backups to ZFS guest for backups there as well, using rsync or zfs send/recv. The choice is yours, but my recommendation is you are running as little as possible on main host as far as internet services.

If you did it all right, you’ll find your barely doing anything on main host and always logged into guests instead, then you know you did everything right. You might edit PF firewall to block something from all guests time to time, or do updates, about it. You have to remember your main host will save you with disaster recovery on guests, create new guests, basically be the blood and soul of your system, and this is where FreeBSD shines.

If you have ability at all to run FreeBSD as main host, you’ll save yourself years of headaches, where every Linux sys admin is reinstalling a new release on main host every 2-4 years, you did a freebsd-update or rebuilt the world and went and watched a movie while other sys admins were pulling their hair out all week wishing they documented their configs better 🙂

What about mission critical? In this situation your going to learn a lot about ZFS and send/recv to clone guests on the fly. For every other situation, a simple rsync once a night of the /etc, /usr/local/etc, /boot, /root, /home directories is all you need, why waste space? I’m not going to clone a 100GB guest byte for byte, if something happens to that guest, I have all its configs, I’m good to go. Install a simple rsyncd.conf on each guest to backup its configs each night. Every host is in charge of all their guests backups to a directory. Then its your decision to do offline backups with rsync or zfs send/recv from that host.

So I know all the worries, been there done that. As I’m moving through this guide, I’m going to show you disaster mitigation techniques on the host as well, so your well prepared if disaster ever strikes on a guest.

MY RESEARCH:

To run FreeBSD effectively as a main host in production a lot had to be accounted for, and I will list them here:

All guests can be resumed quickly after a main host reboots for kernel update after suspending all guests. No ssh connections lost to guests, no accidental forgetting to save work bites you on resumes of guests.
If a guest fails to boot at anytime, mandatory mounting of /etc , /boot etc directories of guest to fix problems on host.
If a guest runs out of space, ability to resize guest on host, and grow the FS on the guest afterwards with as little downtime as possible.
Ability to suspend a guest quickly to fix any problems. With this typically you want to reboot guest properly and fix issues from host after that.
FreeBSD only: ability to ZFS snapshot guests and roll them back to any previous snapshot if needed.

Bugs I found and possible remedies:

In my preliminary research I found what bhyve actually does is run itself if a while loop, if it exists with error code 0, then bhyve is run again, effectively a “shutdown -r now” working properly. If it exits with any other error code then loop is broken, everything can be cleaned up after guest. The problem I found with suspend/resume so far on this issue is once suspended it exits with error code 0 as well. There definitely should be a different error code attached. There is a progress bar written to screen on STDOUT before it does shut down, only way currently to differentiate between the two is to capture that output. Something that would be better left to python asyncio/fastapi server process keeping all guests in a loop and determining difference between a suspend and a clean exit, then you could have a separate command line utility to access API’s on server process would probably be the best solution right now. A coding exercise in python that shouldn’t take more than a week to code. A 6-12 month front end to that with GUI and flutter would probably be best overall supporting the most platforms from one code base.

On further research there are mainly 3 types of ways to pass disks to guests, virtio-blk, virtio-scsi and nvme. On my tests with suspend/resume virtio-blk is stable each time I ran it. On virtio-scsi I had issues, on resume I could do an “ls -al”, see the filesystem properly but after doing anything else like a “df -k” or logging into system, it would hang then eventually crash with error code 139. I reported this is freebsd-current mailing list and hopefully someone gets around to looking at it.

Further research on virtio-blk has shown that it is slowly being phased out for virtio-scsi. Apparently this is the new version of passing disks that will be more common in the future. The belief behind it stems from being to hard to rework the virtio-blk code as well as virtio-scsi has more features and ability to pass way more disks from a host to guests. As far as nvme, I did not have any to test with, regardless of SSD type I believe the move will attempt to include all SSD/nvme to one virtio-scsi configuration in the future so users should embrace virtio-scsi once it is stable with suspend/resume on FreeBSD.

Upon further testing of attempting to mount a ZFS guest to the main host, it was unstable with no plans to fix it. Upon contacting freebsd-current mailing list about this issue, I was informed it causes deadlocks due to lock recursion and is why the sysctl vfs.zfs.vol.recursive was turned off by default. Suggestion was to use scsi instead, which in my testing did mount the ZFS guest without issues, a further suggestion that virtio-scsi is the future.

Upon examining the C source code to the virtio-scsi driver, it creates /dev/cam/* devices that can be used for the guests to passthrough targets with setup luns on the host.

router:/root # ls -al /dev/cam/ctl*
crw------- 1 root operator 0, 164 Nov 10 02:29 /dev/cam/ctl
crw------- 1 root operator 0, 167 Nov 10 08:14 /dev/cam/ctl1.0
crw------- 1 root operator 0, 168 Nov 10 08:14 /dev/cam/ctl2.0
router:/root #

What happens here is a port is created that can be used in the virtio-scsi line of a guests config to pass devices. Upon attaching to targets, the luns create /dev/da* devices.

router:/root # ls -al /dev/da*
crw-r----- 1 root operator 0, 134 Nov 10 02:29 /dev/da0
crw-r----- 1 root operator 0, 135 Nov 10 02:29 /dev/da0s1
crw-r----- 1 root operator 0, 136 Nov 10 02:29 /dev/da0s2
crw-r----- 1 root operator 0, 138 Nov 10 02:29 /dev/da0s2a
crw-r----- 1 root operator 0, 181 Nov 10 08:14 /dev/da1
crw-r----- 1 root operator 0, 183 Nov 10 08:14 /dev/da1p1
crw-r----- 1 root operator 0, 184 Nov 10 08:14 /dev/da1p2
crw-r----- 1 root operator 0, 185 Nov 10 08:14 /dev/da1p3
crw-r----- 1 root operator 0, 182 Nov 10 08:14 /dev/da2
crw-r----- 1 root operator 0, 186 Nov 10 08:14 /dev/da2p1
crw-r----- 1 root operator 0, 187 Nov 10 08:14 /dev/da2p2
crw-r----- 1 root operator 0, 188 Nov 10 08:14 /dev/da2p3
crw-r----- 1 root operator 0, 206 Nov 10 08:14 /dev/da3
crw-r----- 1 root operator 0, 207 Nov 10 08:14 /dev/da3p1
crw-r----- 1 root operator 0, 208 Nov 10 08:14 /dev/da3p2
router:/root #

The first one /dev/da0 is reserved for scsi itself, every other lun created in a target creates a new /dev/da* device, first one beginning at /dev/da1 and so forth.

For my purposes I had the ZFS guest as first lun in my test and was able to manipulate /dev/da1 successfully to mount the ZFS guest. I could also pass a target to a guest with as many disks/cdroms/ISOs(luns) as I wanted just by passing the /dev/cam/ctl1.0 target.

Let’s do a quick illustration of passing zvols in FreeBSD to scsi instead, you could also pass disk images if you wanted:

nano /etc/ctl.conf:
(add following:)
portal-group pg0 {
        discovery-auth-group no-authentication
        listen 127.0.0.1:3260
}
target iqn.2005-02.com.sunsaturn:target0 {
        auth-group no-authentication
        portal-group pg0
        #bhyve virti-iscsi disk - /dev/cam/ctl1.0
        port ioctl/1
        lun 0 {
                path /dev/zvol/zroot/asterisk
                #blocksize 128
                serial 000c2937247001
                device-id "iSCSI Disk 000c2937247001"
                option vendor "FreeBSD"
                option product "iSCSI Disk"
                option revision "0123"
                option insecure_tpc on
        }
}
target iqn.2005-02.com.sunsaturn:target1 {
        auth-group no-authentication
        portal-group pg0
        #bhyve virti-iscsi disk - /dev/cam/ctl2.0
        port ioctl/2

        lun 0 {
                path /dev/zvol/zroot/asterisk2
                #blocksize 128
                serial 000c2937247002
                device-id "iSCSI Disk 000c2937247002"
                option vendor "FreeBSD"
                option product "iSCSI Disk"
                option revision "0123"
                option insecure_tpc on
        }
        lun 1 {
                path /vm/.iso/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso
                #byhve seems to just hang when I set it to an actual CDROM so let it default to type 0
                #device-type 5
                serial 000c2937247003
                device-id "iSCSI CDROM ISO 000c2937247003"
                option vendor "FreeBSD CDROM"
                option product "iSCSI CDROM"
                option revision "0123"
                option insecure_tpc on
        }
}

(close and exit)
nano /etc/iscsi.conf
(add following:)
t0 {
        TargetAddress   = 127.0.0.1:3260
        TargetName      = iqn.2005-02.com.sunsaturn:target0
}
t1 {
        TargetAddress   = 127.0.0.1:3260
        TargetName      = iqn.2005-02.com.sunsaturn:target1
}
(close and exit)
nano /etc/rc.conf
(add following:)
#ISCSI - service ctld start && service iscsid start
#server
ctld_enable="YES"          #load /etc/ctl.conf
iscsid_enable="YES"        #start iscsid process to connect to ctld
#client - service iscsictl start
iscsictl_enable="YES"      #connect to all targets in /etc/iscsi.conf
iscsictl_flags="-Aa"

(close and exit)
(now let's create some zvols to install guests on)
zfs create -V30G -o volmode=dev zroot/asterisk
zfs create -V30G -o volmode=dev zroot/asterisk2
cd /vm/.iso
wget https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/14.0/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso
(let's start scsi manually)
service ctld start && service iscsid start && service iscsictl start

Now we can see all those /dev/da* devices for us to manipulate on host for mounting shut down guests, as well as those /dev/cam/* devices for passing to guests.

Next let’s actually install 2 test guests, asterisk and asterisk2. For asterisk guest I will install UFS FreeBSD, and for asterisk2 I will install ZFS guest as well as pass the FreeBSD ISO to that guest.

To make this easier, let’s use a bash script I quickly coded to test suspend/resume capabilities of each guest, let’s call them asterisk.sh and asterisk2.sh:

cd /root
nano -w asterisk.sh
(add following)
#!/bin/bash
#
# General script to test bhyve suspend/resume features
#
# Requirements: FreeBSD current
# screen
# git clone https://git.FreeBSD.org/src.git /usr/src
# cd /usr/src/sys/amd64/conf (edit MYKERNEL) cp GENERIC MYKERNEL-NODEBUG; (add: options         BHYVE_SNAPSHOT)
# cd /usr/src
# (find amount of CPUs and adjust -j below - "dmesg|grep SMP")
# make -j12 buildworld -DWITH_BHYVE_SNAPSHOT -DWITH_MALLOC_PRODUCTION
# make -j12 buildkernel KERNCONF=MYKERNEL
# make installkernel KERNCONF=MYKERNEL
# shutdown -r now
# cd /usr/src; make installworld
# shutdown -r now
# etcupdate -B
# pkg bootstrap -f #if new freebsd version
# pkg upgrade -f   #if new freebsd version 
#
# Report anomolies to dan@sunsaturn.com

##############EDIT ME#####################


HOST="127.0.0.1"                        # vncviewer 127.0.0.1:5900 - pkg install tightvnc
PORT="5900"
WIDTH="800"
HEIGHT="600"
VMNAME="asterisk"
ISO="/vm/.iso/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso"
DIR="/vm/asterisk"                      # Used to hold files when guest suspended
SERIAL="/dev/nmdm_asteriskA"           # For "screen /dev/nmdm_asteriskB" - pkg install screen
TAP="tap0"
CPU="8"
RAM="8G"

#For testing virtio-scsi
STORAGE="/dev/cam/ctl1.0"               # port from /etc/ctl.conf(port ioctl/1) - core dumping on resume
DEVICE="virtio-scsi"

#for testing virtio-blk                 # Comment out above 2 lines if using these
#DEVICE="virtio-blk"                    
#STORAGE="/dev/zvol/zroot/asterisk"     # Standard zvol
#STORAGE="/dev/da1"                     # Block device created from iscsictl

#########################################

usage() {
   echo "Usage: $1 start    (Start the guest: $VMNAME)"; 
   echo "Usage: $1 stop     (Stop the guest: $VMNAME)"; 
   echo "Usage: $1 resume   (Resume the guest from last suspend: $VMNAME)"; 
   echo "Usage: $1 suspend  (Suspend the guest: $VMNAME)"; 
   echo "Usage: $1 install  (Install new guest: $VMNAME)"; 
   exit
}

if [ ! -d "$DIR" ]; then 
   mkdir -p $DIR
fi

#if [ -z "$2" ]; then
#   usage
#else
#   VMNAME=$2
#fi


if [ "$1" == "install" ]; then
   #Kill it before starting it
   echo "Execute: screen $SERIAL"
   bhyvectl --destroy --vm=$VMNAME
   bhyve -c $CPU -m $RAM -w -H -A \
      -s 0:0,hostbridge \
      -s 3:0,ahci-cd,$ISO \
      -s 4:0,$DEVICE,$STORAGE  \
      -s 5:0,virtio-net,$TAP \
      -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
      -s 30,xhci,tablet \
      -s 31,lpc -l com1,stdio \
      -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
      $VMNAME
   #kill it after 
   bhyvectl --destroy --vm=$VMNAME
elif [ "$1" == "start" ]; then 
   while true
   do
      echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
      #Kill it before starting it
      bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
      bhyve -c $CPU -m $RAM -w -H -A \
         -s 0:0,hostbridge \
         -s 4:0,$DEVICE,$STORAGE  \
         -s 5:0,virtio-net,$TAP \
         -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
         -s 30,xhci,tablet \
         -s 31,lpc -l com1,$SERIAL \
         -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
         $VMNAME
      #DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
      #if [ "$?" != 0 ];
      #then
      #   echo "The exit code was not reboot code 0!: $?"
      #   exit
      #fi
      echo "The exit code was : $?"
      exit
   done
elif [ "$1" == "resume" ]; then 
   while true
   do
      echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
      #Kill it before starting it
      bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
      if [ -f "$DIR/default.ckp" ]; then
         bhyve -c $CPU -m $RAM -w -H -A \
            -s 0:0,hostbridge \
            -s 4:0,$DEVICE,$STORAGE  \
            -s 5:0,virtio-net,$TAP \
            -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
            -s 30,xhci,tablet \
            -s 31,lpc -l com1,$SERIAL \
            -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
            -r $DIR/default.ckp \
            $VMNAME
      else
         echo "Guest was never suspended"
         exit
      fi
      #DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
      #if [ "$?" != 0 ];
      #then
      #   echo "The exit code was not reboot code 0!: $?"
      #   exit
      #fi
      echo "The exit code was : $?"
      exit
   done
elif [ "$1" == "suspend" ];
then 
   bhyvectl --suspend $DIR/default.ckp --vm=$VMNAME

elif [ "$1" == "stop" ]; then 
   bhyvectl --destroy --vm=$VMNAME 
else 
   usage
fi

Let’s also create asterisk2.sh:

#!/bin/bash
#
# General script to test bhyve suspend/resume features
#
# Requirements: FreeBSD current
# screen
# git clone https://git.FreeBSD.org/src.git /usr/src
# cd /usr/src/sys/amd64/conf (edit MYKERNEL) cp GENERIC MYKERNEL-NODEBUG; (add: options         BHYVE_SNAPSHOT)
# cd /usr/src
# (find amount of CPUs and adjust -j below - "dmesg|grep SMP")
# make -j12 buildworld -DWITH_BHYVE_SNAPSHOT -DWITH_MALLOC_PRODUCTION
# make -j12 buildkernel KERNCONF=MYKERNEL
# make installkernel KERNCONF=MYKERNEL
# shutdown -r now
# cd /usr/src; make installworld
# shutdown -r now
# etcupdate -B
# pkg bootstrap -f #if new freebsd version
# pkg upgrade -f   #if new freebsd version 
#
# Report anomolies to dan@sunsaturn.com

##############EDIT ME#####################


HOST="127.0.0.1"                        # vncviewer 127.0.0.1:5900 - pkg install tightvnc
PORT="5901"
WIDTH="800"
HEIGHT="600"
VMNAME="asterisk2"
ISO="/vm/.iso/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso"
DIR="/vm/asterisk2"                      # Used to hold files when guest suspended
SERIAL="/dev/nmdm_asterisk2A"           # For "screen /dev/nmdm_asterisk2B" - pkg install screen
TAP="tap1"
CPU="8"
RAM="8G"

#For testing virtio-scsi
STORAGE="/dev/cam/ctl2.0"               # port from /etc/ctl.conf(port ioctl/1) - core dumping on resume
DEVICE="virtio-scsi"

#for testing virtio-blk                 # Comment out above 2 lines if using these
#DEVICE="virtio-blk"                    
#STORAGE="/dev/zvol/zroot/asterisk2"     # Standard zvol
#STORAGE="/dev/da2"                     # Block device created from iscsictl

#########################################

usage() {
   echo "Usage: $1 start    (Start the guest: $VMNAME)"; 
   echo "Usage: $1 stop     (Stop the guest: $VMNAME)"; 
   echo "Usage: $1 resume   (Resume the guest from last suspend: $VMNAME)"; 
   echo "Usage: $1 suspend  (Suspend the guest: $VMNAME)"; 
   echo "Usage: $1 install  (Install new guest: $VMNAME)"; 
   exit
}

if [ ! -d "$DIR" ]; then 
   mkdir -p $DIR
fi

#if [ -z "$2" ]; then
#   usage
#else
#   VMNAME=$2
#fi


if [ "$1" == "install" ]; then
   #Kill it before starting it
   echo "Execute: screen $SERIAL"
   bhyvectl --destroy --vm=$VMNAME
   bhyve -c $CPU -m $RAM -w -H -A \
      -s 0:0,hostbridge \
      -s 3:0,ahci-cd,$ISO \
      -s 4:0,$DEVICE,$STORAGE  \
      -s 5:0,virtio-net,$TAP \
      -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
      -s 30,xhci,tablet \
      -s 31,lpc -l com1,stdio \
      -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
      $VMNAME
   #kill it after 
   bhyvectl --destroy --vm=$VMNAME
elif [ "$1" == "start" ]; then 
   while true
   do
      echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
      #Kill it before starting it
      bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
      bhyve -c $CPU -m $RAM -w -H -A \
         -s 0:0,hostbridge \
         -s 4:0,$DEVICE,$STORAGE  \
         -s 5:0,virtio-net,$TAP \
         -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
         -s 30,xhci,tablet \
         -s 31,lpc -l com1,$SERIAL \
         -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
         $VMNAME
      #DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
      #if [ "$?" != 0 ];
      #then
      #   echo "The exit code was not reboot code 0!: $?"
      #   exit
      #fi
      echo "The exit code was : $?"
      exit
   done
elif [ "$1" == "resume" ]; then 
   while true
   do
      echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
      #Kill it before starting it
      bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
      if [ -f "$DIR/default.ckp" ]; then
         bhyve -c $CPU -m $RAM -w -H -A \
            -s 0:0,hostbridge \
            -s 4:0,$DEVICE,$STORAGE  \
            -s 5:0,virtio-net,$TAP \
            -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
            -s 30,xhci,tablet \
            -s 31,lpc -l com1,$SERIAL \
            -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
            -r $DIR/default.ckp \
            $VMNAME
      else
         echo "Guest was never suspended"
         exit
      fi
      #DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
      #if [ "$?" != 0 ];
      #then
      #   echo "The exit code was not reboot code 0!: $?"
      #   exit
      #fi
      echo "The exit code was : $?"
      exit
   done
elif [ "$1" == "suspend" ];
then 
   bhyvectl --suspend $DIR/default.ckp --vm=$VMNAME

elif [ "$1" == "stop" ]; then 
   bhyvectl --destroy --vm=$VMNAME 
else 
   usage
fi

Great now let’s install the guests:

chmod 755 *.sh
./asterisk.sh install
(at this point just install FreeBSD with default UFS options, I actually chose to remove swap partition at the end, and the root partition then add them back as swap 2nd and root partition last to avoid headaches having to grow the disk later on, I suggest you do the same)
./asterisk2.sh install (install with GPT+UEFI)
(install FreeBSD again with default ZFS install, there is no headache here partitions are perfect here as defaults on zroot)

(At this point I may edit /etc/fstab to change whatever hardcoded device names
are in there to GPT names from /dev/gpt/* so when we switch between virtio-scsi and virtio-blk devices it won't matter. Ie for swap switch it to:
/dev/gpt/swap0 instead) If you are ever wanting to show what the GPT label names are you can either do "gdisk -l <device>" or "gpart show -l <device>" to figure out what to put into /etc/fstab. If they have no label , give them one. )

#now let's start them both after the install
./asterisk.sh start
#another terminal
./asterisk2.sh start
#another terminal
screen /dev/nmdm_asteriskB
#another terminal
screen /dev/nmdm_asterisk2B
#another terminal
./asterisk.sh suspend
./asterisk.sh resume
#another terminal
./asterisk2.sh suspend
./asterisk2.sh resume

Now you will notice on your screen session, after resuming guest “ls” etc works but as soon as we do anything else, “df -h” it will hang and after about a minute it will core dump.

router:/root # ./asterisk2.sh resume
Starting asterisk2 -s 29,fbuf,tcp=127.0.0.1:5901,w=800,h=600
fbuf frame buffer base: 0x229792600000 [sz 16777216]
Pausing pci devs...
pci_pause: no such name: virtio-blk
pci_pause: no such name: ahci
pci_pause: no such name: ahci-hd
pci_pause: no such name: ahci-cd
Restoring vm mem...
[8192.000MiB / 8192.000MiB] |################################################################################################################################################|
Restoring pci devs...
vm_restore_user_dev: Device size is 0. Assuming virtio-blk is not used
vm_restore_user_dev: Device size is 0. Assuming virtio-rnd is not used
vm_restore_user_dev: Device size is 0. Assuming e1000 is not used
vm_restore_user_dev: Device size is 0. Assuming ahci is not used
vm_restore_user_dev: Device size is 0. Assuming ahci-hd is not used
vm_restore_user_dev: Device size is 0. Assuming ahci-cd is not used
Restoring kernel structs...
Resuming pci devs...
pci_resume: no such name: virtio-blk
pci_resume: no such name: ahci
pci_resume: no such name: ahci-hd
pci_resume: no such name: ahci-cd
./asterisk2.sh: line 145: 10883 Segmentation fault      (core dumped) bhyve -c $CPU -m $RAM -w -H -A -s 0:0,hostbridge -s 4:0,$DEVICE,$STORAGE -s 5:0,virtio-net,$TAP -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT -s 30,xhci,tablet -s 31,lpc -l com1,$SERIAL -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -r $DIR/default.ckp $VMNAME
The exit code was : 139
router:/root #

Great so scsi is unstable with suspend/resume. Now run same tests but let’s modify asterisk2.sh script to use virtio-blk instead with /dev/da2 and run same tests:

#For testing virtio-scsi
#STORAGE="/dev/cam/ctl2.0"              # port from /etc/ctl.conf(port ioctl/1) - core dumping on resume
#DEVICE="virtio-scsi"

#for testing virtio-blk                 # Comment out above 2 lines if using these
DEVICE="virtio-blk"
#STORAGE="/dev/zvol/zroot/asterisk2"    # Standard zvol
STORAGE="/dev/da2"                      # Block device created from iscsictl

You may be wondering how I know which /dev/da* asterisk2 is using:
I simply run:
iscsictl 
#This will give you a list of who is on what, it can be completely random
#always check this list before mounting a guest so you don't mount wrong one
#typically everything on lun0 will be numbered first, then lun1 but this is not #always the case so make sure to run that
#on a non-testing host it may look something like this:
router:/root # iscsictl 
Target name                          Target portal    State
iqn.com.sunsaturn.asterisk:target1   127.0.0.1:3260   Connected: da1 da5 
iqn.com.sunsaturn.rocky:target2      127.0.0.1:3260   Connected: da2 da6 
iqn.com.sunsaturn.ubuntu:target3     127.0.0.1:3260   Connected: da3 da8 
iqn.com.sunsaturn.windows:target4    127.0.0.1:3260   Connected: da4 da7 
router:/root # 

Here is something good to add to your .bash_profile then you don't have to 
think about it ever again:

if [ "$HOSTNAME" == "test.test.com" ]; then
   echo "Checking which devices guests connected to:"
   echo "#######################################################"
   iscsictl
   echo "#######################################################"
fi

I personally have a 2nd root account I set to bash so I don't touch default root
shell, can use toor if you like. I suggest everyone do that, there will come a day where you are like damn I can't remember my root password anymore because been using ssh keys so long. Yes you have to su to root everytime, but I even automate that these days.

Now let’s run it:

./asterisk2.sh start
#another terminal (make sure watching screen terminal running these)
./asterisk2.sh suspend
./asterisk2.sh resume

WORKS PERFECTLY:
Now go uncomment STORAGE line with 
STORAGE="/dev/zvol/zroot/asterisk2"
and comment out:
#STORAGE="/dev/da2"
WORKS PERFECTLY TO

So what can we see from suspend/resume stability, it works on virtio-blk perfectly. No matter if I use the /dev/da* devices from scsi or the zvols directly.

What about importing ZFS pool from asterisk2? Sure let’s do it on host, shut it down first:

router:/root # gpart show /dev/da2
=>      40  62914480  da2  GPT  (30G)
        40    532480    1  efi  (260M)
    532520      2008       - free -  (1.0M)
    534528  16777216    2  freebsd-swap  (8.0G)
  17311744  45600768    3  freebsd-zfs  (22G)
  62912512      2008       - free -  (1.0M)

router:/root # gdisk -l /dev/da2
GPT fdisk (gdisk) version 1.0.9

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/da2: 62914560 sectors, 30.0 GiB
Sector size (logical): 512 bytes
Disk identifier (GUID): 8ACD2112-5EBF-11ED-8F56-00A098E3C14E
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 40, last usable sector is 62914519
Partitions will be aligned on 8-sector boundaries
Total free space is 4016 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              40          532519   260.0 MiB   EF00  efiboot0
   2          534528        17311743   8.0 GiB     A502  swap0
   3        17311744        62912511   21.7 GiB    A504  zfs0
router:/root #

So we can see this is definitely our ZFS guest, and we already know we cannot manipulate /dev/zvol/zroot/asterisk2 directly because of recursion issues, but we can manipulate /dev/da2 through scsi just fine:

#Import ZFS guest under /mnt (shut down the guest)
#First import it under /mnt under a different temporary name that won't save the name when we export it after with "-t" option
zpool import (we should see asterisk2's zroot ready for import)
zpool import -fR /mnt -t zroot testing
zfs mount -o mountpoint=/mnt testing/ROOT/default
(go ahead fix /mnt/etc /mnt/boot problems)
zpool export testing
rm -rf /mnt/* (cleanup left over directories)

#hell let's mount asterisk guest with UFS as well on /mnt
#I did crash it a few times so probably need recovery
router:/root # gpart show /dev/da1
=>       40  104857520  da1  GPT  (50G)
         40         24       - free -  (12K)
         64     532480    1  efi  (260M)
     532544   16777216    2  freebsd-swap  (8.0G)
   17309760   87547776    3  freebsd-ufs  (42G)
  104857536         24       - free -  (12K)

router:/root # mount -t ufs /dev/da1p3 /mnt
mount: /dev/da1p3: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted
router:/root # fsck /dev/da1p3 
** /dev/da1p3
** SU+J Recovering /dev/da1p3

USE JOURNAL? [yn] y

** Reading 182419456 byte journal from inode 4.

RECOVER? [yn] y

** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.

WRITE CHANGES? [yn] y


***** FILE SYSTEM IS CLEAN *****
** 96 journal records in 6656 bytes for 46.15% utilization
** Freed 27 inodes (4 dirs) 0 blocks, and 20 frags.

***** FILE SYSTEM MARKED CLEAN *****
router:/root # mount -t ufs /dev/da1p3 /mnt
router:/root # ls /mnt
bin  boot  COPYRIGHT  dev  entropy  etc  home  lib  libexec  media  mnt  net  proc  rescue  root  sbin  sys  tmp  usr  var
router:/root #

Summary:

Suspend/resume does work, just not passing in virtio-scsi with those /dev/cam/* devices. Seems we can do “ls” just fine, so something else is going on.

For now what you can do is just use the virtio-blk devices, until virtio-scsi works. In fact I would set it up this way since we already know virtio-blk is on its way out, plus it was so much nicer to pass in a CDROM on asterisk2 in just 1 line to guest, or as many disks as we want. We learned we can use /dev/da* devices for direct ZFS importing of guests pools for disaster recovery, for anything that isn’t ZFS we could also use the /dev/zvol/zroot/* devices directly if we wanted, but it is a cool work around from FreeBSD team, and sets you up using virtio-scsi now.

If you do not care about suspend/resume right now you could just continue using virtio-scsi to future proof yourself for virtio-blk phase out, or you could just pass virtio-blk devices temporarily till it is fixed and use suspend/resume all you want. My personal preference till this gets sorted out is leave iscsi processes running till it gets resolved, and use virtio-blk directly against /dev/zvol/root/<guestname> for now, can switch them later, this way you can use suspend/resume all you want as well as mount any ZFS guests through the /dev/da* devices anytime you need. This way your at least future proofed. If you don’t want to use the /dev/da* devices :

zfs set volmode=full zroot/asterisk
(if you want to be able to mount non-ZFS guests directly at:
/dev/zvol/zroot/asterisk for instance, what this will do is separate asterisk into asteriskp1 asteriskp2 asteriskp3 etc for your mounting purposes)
#Just get used to scsi already :) I know old habits die hard and no one wants to #give up NIS either and learn LDAP, hell neither do I, whatever works :)

Hope you enjoyed this article on suspend/resume experimental support for FreeBSD, perhaps in another article I will show you Linux and Windows guests, and how we can disaster recovery them as well like we did for these 2 UFS and ZFS guests today, till then….

Shine bright like a diamond,

Dan.