INTRO:
FreeBSD Bhyve virtualization why? Sure there are different virtualization packages out there, but you have FreeBSD! Why not use ESXI? From experience any closed source software generally has a max lifetime of 10 years(abandonware), and opensource will generally over throw them by then. Also anything you don’t have full control over on root prompt to read the source code will be a nightmare to administer.
What about Linux and KVM? Yes, before Bhyve they were the best, the virt-manager and virsh/virt-viewer commands were unbeatable. Here is the thing with Redhat however, they over the course of just 2 years now have made most of sys admins angry on the internet. First they removed the megaraid drivers from the kernel(yes people still run Dell r710’s etc as fileservers), which caused a lot of people grief, then on upgrade to Redhat 9, they killed off a lot of Dell servers as well, people who upgraded had to reinstall with Rocky Linux 8 as they used a glibc that did not work on a lot of people’s systems. And to top it off they killed off Centos by moving it to Centos-stream, angering rest of internet. So everyone moved to Rocky Linux(original creator of Centos), to keep their hosts “production”.
The worst thing about Linux to begin with as a virtualization host, is every 2 years when a new major release comes out, most people have to take time out of their day to reinstall the OS. A core OS! With FreeBSD you can update to new major releases by way of simple freebsd-update command or rebuilding source code from scratch with “make buildworld” as well. No downtime reinstalling, no host to replace, a sys admin dream.
I’ve had fights with programmers over the years over Linux and FreeBSD, especially when it came to libevent and libev, where people swore by epoll over kqueue, from experience put FreeBSD under load and then a Linux server, your loads will be better served on FreeBSD with less CPU utilization, the network stack is better, more efficient, even Netflix uses it.
And now I know what everyone is going to say, been using Linux/KVM virt-manager for so many years, FreeBSD doesn’t even have suspend/resume stable. I’m not loosing my 20 SSH connections to my guests just because I had to reboot for a kernel update. I can’t disagree with this, I’d hate to be working 4 hours on a script on one of them, reboot for a kernel update, and lost all my work because I forgot to save on a host and host didn’t resume guests properly after reboot.
So why use Linux at all? I think Linux is fine as a guest, just not in the frontline of battle for reasons I mentioned above. There are still lots of things that will only run on Linux, if your a flutter programmer, FreeBSD still has not ported Dart language, so your apps will still need a Linux guest to test builds. If I’m considering working on an app, I would most likely install an ubuntu guest for that, do all your dart/c/java there along with windows 11, and your python Fastapi’s interfacing with Mysql on Linux and/or FreeBSD.
Honestly for your server part of your app, I would use a FreeBSD guest to begin with hosting Fastapi, I will show you how you could pass a second disk through to the guest with a proper 16k blocksize for max Mysql performance for your app. The biggest reason to use FreeBSD upfront, you can be stuck 6 months to a year working on an app, you really need a week of downtime because some new Linux release came out and you have to reinstall? I didn’t think so…let FreeBSD “Shine bright like a diamond” as Rihanna would sing.
So today I am going to take what is the most important and exciting #1 feature release for FreeBSD 14, suspend and resume. I will be installing FreeBSD current to test it. I will also show how FreeBSD is a force to be reckoned with as the core OS with virtualization.
Test case scenario:
Dell r720, Intel Enterprise 480GB SSD as core OS slapped into an icy dock for core OS, along with 2 12TB rust spinning drives for separate ZFS backup pool all on a megaraid JBOB controller. FreeBSD current, with production flags, with experimental suspend and resume features enabled in kernel and userland.
For main OS we will use default ZFS install, for guest tests we will install 2 FreeBSD guests, one with ZFS, and one with UFS. We will suspend them both, then resume them and see if we loose our SSH connections π I will most likely do a part 2 on installing Linux and Windows guests as I see this article will get pretty lengthy just with these 2 guests…
For third-party packages to interface with suspend and resume, we can’t use any. We will have to do everything manually, I will keep it simple with just bash scripts. Maybe I’ll code an advanced interface with python and asyncio in future. One third party package I did run across I kind of liked was vm-bhyve. To be honest only thing I liked about it was his directory layout, so that’s only thing I will try to keep consistent for our scenario.
Bhyve still needs a good interface to it, I think what would be best for it is python fastapi or C/Rust with kqueue as server end. Front end hands down flutter, will build on Linux, apache/nginx, android, IOS, windows and even our TVs in livingroom for a front end interface. Even on FreeBSD natively once Dart is ported to it. I wouldn’t even bother with Java or Kotlin for these reasons alone. I think if that was built for a year would turn FreeBSD into everyone’s favorite virtualization OS.
Back to sys admin stuff, let’s set this up once really well and we should be good for next 5-10 years, just clone the drive if moving to PCIE 4 or 5 machine this is not Linux π
Setup Bhyve:
Let’s start with something simple, setup our directory structure like vm-bhyve, mod what we need and uninstall his package. I’m going to assume at this point you just have a FreeBSD release like FreeBSD 13.1 installed, so let’s begin….
https://github.com/churchers/vm-bhyve
pkg install vm-bhyve bhyve-firmware
zfs create -o mountpoint=/vm zroot/vm
sysrc vm_enable="YES"
sysrc vm_dir="zfs:pool/vm"
vm init
cp /usr/local/share/examples/vm-bhyve/* /vm/.templates/
OK great now we have a directory structure to work with, he likes to keep all guest related things in /vm/<guestname>, so let’s keep with his directory structure idea and delete the package now.
pkg delete vm-bhyve
nano /etc/rc.conf
(now edit /etc/rc.conf and remove/comment out his sysrc lines)
(while we in here let's add support for 4 guest tap interfaces)
#support for 4 test guests - substitute "ix0" for your own interface
cloned_interfaces="bridge0 tap0 tap1 tap2 tap3"
ifconfig_bridge0_name="br0"
ifconfig_br0="addm ix0 addm tap0 addm tap1 addm tap2 addm tap3"
(exit /etc/rc.conf)
#now let's just create what that does manually on command line for now
ifconfig bridge create
ifconfig tap0 create
ifconfig tap1 create
ifconfig tap2 create
ifconfig tap3 create
ifconfig bridge0 addm ix0 addm tap0 addm tap1 addm tap2 addm tap3
ifconfig bridge0 name br0
ifconfig br0 up
#damn starting to like /etc/rc.conf better already :)
(alright now let's edit /etc/sysctl.conf)
nano /etc/sysctl.conf (add following:)
#BHYVE
net.link.tap.up_on_open=1
#BYHVE + PF nat
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1
(exit /etc/sysctl.conf)
nano /boot/loader.conf
(have it look like this:)
autoboot_delay="5"
kernels="kernel kernel.old"
boot_serial="YES"
#stuff added by install
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
#bhyve
vmm_load="YES"
nmdm_load="YES"
if_bridge_load="YES"
if_tap_load="YES"
(exit /boot/loader.conf)
At this point we should be able to do an “ifconfig” and see our bridge setup properly for 4 possible guests, just add more tap interfaces for additional guests. Now let’s reboot and make sure our bridge etc is still there:
shutdown -r now
ifconfig -a
#let's install a few packages to help us
pkg install screen tightvnc git bash
Moving to FreeBSD current with experimental suspend/resume support(MOVIE TIME!):
Ok the time consuming part, hope you have a lot of cores/fast CPUs π
Alright even if you have current installed you still will not have support enabled so we are going to recompile kernel and userland for support. Honestly this is way to upgrade FreeBSD from source anytime, only difference on stable/release is I would checkout a different branch with git and be using GENERIC instead of GENERIC-NODEBUG to copy from. You may be thinking I’ll checkout a release branch and just compile in support, tried that, it won’t work properly, this is your only way to test it before 14 release where it actually works for most part.
The make buildworld, make buildkernel, make installworld commands will take a long time, I recommend going to play your favorite video game or watch a movie after executing those commands, return in a few hours to check in on them time to time. Honestly once you have the buildworld out of the way its not that bad after. I’d recommend running everything in a screen session as well, that way if something happens you can log back in and do a “screen -r”. After your done, return tomorrow and we will continue on with testing suspend and resume π
screen
git clone https://git.FreeBSD.org/src.git /usr/src
cd /usr/src/sys/amd64/conf
(edit MYKERNEL) cp GENERIC-NODEBUG MYKERNEL (add: options BHYVE_SNAPSHOT)
cd /usr/src
#(find amount of CPUs and adjust -j below - "dmesg|grep SMP")
make -j12 buildworld -DWITH_BHYVE_SNAPSHOT -DWITH_MALLOC_PRODUCTION
make -j12 buildkernel KERNCONF=MYKERNEL
make installkernel KERNCONF=MYKERNEL
shutdown -r now
cd /usr/src; make installworld
shutdown -r now
etcupdate -B
pkg bootstrap -f #if new freebsd version
pkg upgrade -f #if new freebsd version
shutdown -r now
Preliminary Thoughts on Creating Guests:
We are back and ready to roll! So creating guests is probably something every sys admin sits there and thinks hours upon hours about. What performance improvements could I make, will I be able to mount /etc and /boot directory of an offline guest if anything goes wrong or you screw anything up on guest. How will you do backups for them, how will you recover from disaster if it strikes. These guests once setup can run 5-10 years, no room for error, everything has to be accounted for.
I will give you my take on this from decades of experience. When you are starting out fresh, your main concern is the main virtualization host is doing nothing but virtualization and routing. You’d don’t want apache, email servers or any type of server processes running on it that are better suited for guests that can crash if need be and not affect the main host and other guests running on it. You want the main host rock stable at all times. You can play with the guests and give them the memory and CPUs they require to run what they need.
On the main FreeBSD host – virtualization, DNS, isc-DHCP/isc-DHCPD6, RADVD, PF FIREWALL, and at most backups/ZFS snapshots. Over the years before FreeBSD had Bhyve, I offloaded all backups to ZFS guest for backups there as well, using rsync or zfs send/recv. The choice is yours, but my recommendation is you are running as little as possible on main host as far as internet services.
If you did it all right, you’ll find your barely doing anything on main host and always logged into guests instead, then you know you did everything right. You might edit PF firewall to block something from all guests time to time, or do updates, about it. You have to remember your main host will save you with disaster recovery on guests, create new guests, basically be the blood and soul of your system, and this is where FreeBSD shines.
If you have ability at all to run FreeBSD as main host, you’ll save yourself years of headaches, where every Linux sys admin is reinstalling a new release on main host every 2-4 years, you did a freebsd-update or rebuilt the world and went and watched a movie while other sys admins were pulling their hair out all week wishing they documented their configs better π
What about mission critical? In this situation your going to learn a lot about ZFS and send/recv to clone guests on the fly. For every other situation, a simple rsync once a night of the /etc, /usr/local/etc, /boot, /root, /home directories is all you need, why waste space? I’m not going to clone a 100GB guest byte for byte, if something happens to that guest, I have all its configs, I’m good to go. Install a simple rsyncd.conf on each guest to backup its configs each night. Every host is in charge of all their guests backups to a directory. Then its your decision to do offline backups with rsync or zfs send/recv from that host.
So I know all the worries, been there done that. As I’m moving through this guide, I’m going to show you disaster mitigation techniques on the host as well, so your well prepared if disaster ever strikes on a guest.
MY RESEARCH:
To run FreeBSD effectively as a main host in production a lot had to be accounted for, and I will list them here:
- All guests can be resumed quickly after a main host reboots for kernel update after suspending all guests. No ssh connections lost to guests, no accidental forgetting to save work bites you on resumes of guests.
- If a guest fails to boot at anytime, mandatory mounting of /etc , /boot etc directories of guest to fix problems on host.
- If a guest runs out of space, ability to resize guest on host, and grow the FS on the guest afterwards with as little downtime as possible.
- Ability to suspend a guest quickly to fix any problems. With this typically you want to reboot guest properly and fix issues from host after that.
- FreeBSD only: ability to ZFS snapshot guests and roll them back to any previous snapshot if needed.
Bugs I found and possible remedies:
In my preliminary research I found what bhyve actually does is run itself if a while loop, if it exists with error code 0, then bhyve is run again, effectively a “shutdown -r now” working properly. If it exits with any other error code then loop is broken, everything can be cleaned up after guest. The problem I found with suspend/resume so far on this issue is once suspended it exits with error code 0 as well. There definitely should be a different error code attached. There is a progress bar written to screen on STDOUT before it does shut down, only way currently to differentiate between the two is to capture that output. Something that would be better left to python asyncio/fastapi server process keeping all guests in a loop and determining difference between a suspend and a clean exit, then you could have a separate command line utility to access API’s on server process would probably be the best solution right now. A coding exercise in python that shouldn’t take more than a week to code. A 6-12 month front end to that with GUI and flutter would probably be best overall supporting the most platforms from one code base.
On further research there are mainly 3 types of ways to pass disks to guests, virtio-blk, virtio-scsi and nvme. On my tests with suspend/resume virtio-blk is stable each time I ran it. On virtio-scsi I had issues, on resume I could do an “ls -al”, see the filesystem properly but after doing anything else like a “df -k” or logging into system, it would hang then eventually crash with error code 139. I reported this is freebsd-current mailing list and hopefully someone gets around to looking at it.
Further research on virtio-blk has shown that it is slowly being phased out for virtio-scsi. Apparently this is the new version of passing disks that will be more common in the future. The belief behind it stems from being to hard to rework the virtio-blk code as well as virtio-scsi has more features and ability to pass way more disks from a host to guests. As far as nvme, I did not have any to test with, regardless of SSD type I believe the move will attempt to include all SSD/nvme to one virtio-scsi configuration in the future so users should embrace virtio-scsi once it is stable with suspend/resume on FreeBSD.
Upon further testing of attempting to mount a ZFS guest to the main host, it was unstable with no plans to fix it. Upon contacting freebsd-current mailing list about this issue, I was informed it causes deadlocks due to lock recursion and is why the sysctl vfs.zfs.vol.recursive was turned off by default. Suggestion was to use scsi instead, which in my testing did mount the ZFS guest without issues, a further suggestion that virtio-scsi is the future.
Upon examining the C source code to the virtio-scsi driver, it creates /dev/cam/* devices that can be used for the guests to passthrough targets with setup luns on the host.
router:/root # ls -al /dev/cam/ctl*
crw------- 1 root operator 0, 164 Nov 10 02:29 /dev/cam/ctl
crw------- 1 root operator 0, 167 Nov 10 08:14 /dev/cam/ctl1.0
crw------- 1 root operator 0, 168 Nov 10 08:14 /dev/cam/ctl2.0
router:/root #
What happens here is a port is created that can be used in the virtio-scsi line of a guests config to pass devices. Upon attaching to targets, the luns create /dev/da* devices.
router:/root # ls -al /dev/da*
crw-r----- 1 root operator 0, 134 Nov 10 02:29 /dev/da0
crw-r----- 1 root operator 0, 135 Nov 10 02:29 /dev/da0s1
crw-r----- 1 root operator 0, 136 Nov 10 02:29 /dev/da0s2
crw-r----- 1 root operator 0, 138 Nov 10 02:29 /dev/da0s2a
crw-r----- 1 root operator 0, 181 Nov 10 08:14 /dev/da1
crw-r----- 1 root operator 0, 183 Nov 10 08:14 /dev/da1p1
crw-r----- 1 root operator 0, 184 Nov 10 08:14 /dev/da1p2
crw-r----- 1 root operator 0, 185 Nov 10 08:14 /dev/da1p3
crw-r----- 1 root operator 0, 182 Nov 10 08:14 /dev/da2
crw-r----- 1 root operator 0, 186 Nov 10 08:14 /dev/da2p1
crw-r----- 1 root operator 0, 187 Nov 10 08:14 /dev/da2p2
crw-r----- 1 root operator 0, 188 Nov 10 08:14 /dev/da2p3
crw-r----- 1 root operator 0, 206 Nov 10 08:14 /dev/da3
crw-r----- 1 root operator 0, 207 Nov 10 08:14 /dev/da3p1
crw-r----- 1 root operator 0, 208 Nov 10 08:14 /dev/da3p2
router:/root #
The first one /dev/da0 is reserved for scsi itself, every other lun created in a target creates a new /dev/da* device, first one beginning at /dev/da1 and so forth.
For my purposes I had the ZFS guest as first lun in my test and was able to manipulate /dev/da1 successfully to mount the ZFS guest. I could also pass a target to a guest with as many disks/cdroms/ISOs(luns) as I wanted just by passing the /dev/cam/ctl1.0 target.
Let’s do a quick illustration of passing zvols in FreeBSD to scsi instead, you could also pass disk images if you wanted:
nano /etc/ctl.conf:
(add following:)
portal-group pg0 {
discovery-auth-group no-authentication
listen 127.0.0.1:3260
}
target iqn.2005-02.com.sunsaturn:target0 {
auth-group no-authentication
portal-group pg0
#bhyve virti-iscsi disk - /dev/cam/ctl1.0
port ioctl/1
lun 0 {
path /dev/zvol/zroot/asterisk
#blocksize 128
serial 000c2937247001
device-id "iSCSI Disk 000c2937247001"
option vendor "FreeBSD"
option product "iSCSI Disk"
option revision "0123"
option insecure_tpc on
}
}
target iqn.2005-02.com.sunsaturn:target1 {
auth-group no-authentication
portal-group pg0
#bhyve virti-iscsi disk - /dev/cam/ctl2.0
port ioctl/2
lun 0 {
path /dev/zvol/zroot/asterisk2
#blocksize 128
serial 000c2937247002
device-id "iSCSI Disk 000c2937247002"
option vendor "FreeBSD"
option product "iSCSI Disk"
option revision "0123"
option insecure_tpc on
}
lun 1 {
path /vm/.iso/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso
#byhve seems to just hang when I set it to an actual CDROM so let it default to type 0
#device-type 5
serial 000c2937247003
device-id "iSCSI CDROM ISO 000c2937247003"
option vendor "FreeBSD CDROM"
option product "iSCSI CDROM"
option revision "0123"
option insecure_tpc on
}
}
(close and exit)
nano /etc/iscsi.conf
(add following:)
t0 {
TargetAddress = 127.0.0.1:3260
TargetName = iqn.2005-02.com.sunsaturn:target0
}
t1 {
TargetAddress = 127.0.0.1:3260
TargetName = iqn.2005-02.com.sunsaturn:target1
}
(close and exit)
nano /etc/rc.conf
(add following:)
#ISCSI - service ctld start && service iscsid start
#server
ctld_enable="YES" #load /etc/ctl.conf
iscsid_enable="YES" #start iscsid process to connect to ctld
#client - service iscsictl start
iscsictl_enable="YES" #connect to all targets in /etc/iscsi.conf
iscsictl_flags="-Aa"
(close and exit)
(now let's create some zvols to install guests on)
zfs create -V30G -o volmode=dev zroot/asterisk
zfs create -V30G -o volmode=dev zroot/asterisk2
cd /vm/.iso
wget https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/14.0/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso
(let's start scsi manually)
service ctld start && service iscsid start && service iscsictl start
Now we can see all those /dev/da* devices for us to manipulate on host for mounting shut down guests, as well as those /dev/cam/* devices for passing to guests.
Next let’s actually install 2 test guests, asterisk and asterisk2. For asterisk guest I will install UFS FreeBSD, and for asterisk2 I will install ZFS guest as well as pass the FreeBSD ISO to that guest.
To make this easier, let’s use a bash script I quickly coded to test suspend/resume capabilities of each guest, let’s call them asterisk.sh and asterisk2.sh:
cd /root
nano -w asterisk.sh
(add following)
#!/bin/bash
#
# General script to test bhyve suspend/resume features
#
# Requirements: FreeBSD current
# screen
# git clone https://git.FreeBSD.org/src.git /usr/src
# cd /usr/src/sys/amd64/conf (edit MYKERNEL) cp GENERIC MYKERNEL-NODEBUG; (add: options BHYVE_SNAPSHOT)
# cd /usr/src
# (find amount of CPUs and adjust -j below - "dmesg|grep SMP")
# make -j12 buildworld -DWITH_BHYVE_SNAPSHOT -DWITH_MALLOC_PRODUCTION
# make -j12 buildkernel KERNCONF=MYKERNEL
# make installkernel KERNCONF=MYKERNEL
# shutdown -r now
# cd /usr/src; make installworld
# shutdown -r now
# etcupdate -B
# pkg bootstrap -f #if new freebsd version
# pkg upgrade -f #if new freebsd version
#
# Report anomolies to dan@sunsaturn.com
##############EDIT ME#####################
HOST="127.0.0.1" # vncviewer 127.0.0.1:5900 - pkg install tightvnc
PORT="5900"
WIDTH="800"
HEIGHT="600"
VMNAME="asterisk"
ISO="/vm/.iso/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso"
DIR="/vm/asterisk" # Used to hold files when guest suspended
SERIAL="/dev/nmdm_asteriskA" # For "screen /dev/nmdm_asteriskB" - pkg install screen
TAP="tap0"
CPU="8"
RAM="8G"
#For testing virtio-scsi
STORAGE="/dev/cam/ctl1.0" # port from /etc/ctl.conf(port ioctl/1) - core dumping on resume
DEVICE="virtio-scsi"
#for testing virtio-blk # Comment out above 2 lines if using these
#DEVICE="virtio-blk"
#STORAGE="/dev/zvol/zroot/asterisk" # Standard zvol
#STORAGE="/dev/da1" # Block device created from iscsictl
#########################################
usage() {
echo "Usage: $1 start (Start the guest: $VMNAME)";
echo "Usage: $1 stop (Stop the guest: $VMNAME)";
echo "Usage: $1 resume (Resume the guest from last suspend: $VMNAME)";
echo "Usage: $1 suspend (Suspend the guest: $VMNAME)";
echo "Usage: $1 install (Install new guest: $VMNAME)";
exit
}
if [ ! -d "$DIR" ]; then
mkdir -p $DIR
fi
#if [ -z "$2" ]; then
# usage
#else
# VMNAME=$2
#fi
if [ "$1" == "install" ]; then
#Kill it before starting it
echo "Execute: screen $SERIAL"
bhyvectl --destroy --vm=$VMNAME
bhyve -c $CPU -m $RAM -w -H -A \
-s 0:0,hostbridge \
-s 3:0,ahci-cd,$ISO \
-s 4:0,$DEVICE,$STORAGE \
-s 5:0,virtio-net,$TAP \
-s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
-s 30,xhci,tablet \
-s 31,lpc -l com1,stdio \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
$VMNAME
#kill it after
bhyvectl --destroy --vm=$VMNAME
elif [ "$1" == "start" ]; then
while true
do
echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
#Kill it before starting it
bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
bhyve -c $CPU -m $RAM -w -H -A \
-s 0:0,hostbridge \
-s 4:0,$DEVICE,$STORAGE \
-s 5:0,virtio-net,$TAP \
-s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
-s 30,xhci,tablet \
-s 31,lpc -l com1,$SERIAL \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
$VMNAME
#DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
#if [ "$?" != 0 ];
#then
# echo "The exit code was not reboot code 0!: $?"
# exit
#fi
echo "The exit code was : $?"
exit
done
elif [ "$1" == "resume" ]; then
while true
do
echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
#Kill it before starting it
bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
if [ -f "$DIR/default.ckp" ]; then
bhyve -c $CPU -m $RAM -w -H -A \
-s 0:0,hostbridge \
-s 4:0,$DEVICE,$STORAGE \
-s 5:0,virtio-net,$TAP \
-s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
-s 30,xhci,tablet \
-s 31,lpc -l com1,$SERIAL \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
-r $DIR/default.ckp \
$VMNAME
else
echo "Guest was never suspended"
exit
fi
#DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
#if [ "$?" != 0 ];
#then
# echo "The exit code was not reboot code 0!: $?"
# exit
#fi
echo "The exit code was : $?"
exit
done
elif [ "$1" == "suspend" ];
then
bhyvectl --suspend $DIR/default.ckp --vm=$VMNAME
elif [ "$1" == "stop" ]; then
bhyvectl --destroy --vm=$VMNAME
else
usage
fi
Let’s also create asterisk2.sh:
#!/bin/bash
#
# General script to test bhyve suspend/resume features
#
# Requirements: FreeBSD current
# screen
# git clone https://git.FreeBSD.org/src.git /usr/src
# cd /usr/src/sys/amd64/conf (edit MYKERNEL) cp GENERIC MYKERNEL-NODEBUG; (add: options BHYVE_SNAPSHOT)
# cd /usr/src
# (find amount of CPUs and adjust -j below - "dmesg|grep SMP")
# make -j12 buildworld -DWITH_BHYVE_SNAPSHOT -DWITH_MALLOC_PRODUCTION
# make -j12 buildkernel KERNCONF=MYKERNEL
# make installkernel KERNCONF=MYKERNEL
# shutdown -r now
# cd /usr/src; make installworld
# shutdown -r now
# etcupdate -B
# pkg bootstrap -f #if new freebsd version
# pkg upgrade -f #if new freebsd version
#
# Report anomolies to dan@sunsaturn.com
##############EDIT ME#####################
HOST="127.0.0.1" # vncviewer 127.0.0.1:5900 - pkg install tightvnc
PORT="5901"
WIDTH="800"
HEIGHT="600"
VMNAME="asterisk2"
ISO="/vm/.iso/FreeBSD-14.0-CURRENT-amd64-20221103-5cc5c9254da-259005-disc1.iso"
DIR="/vm/asterisk2" # Used to hold files when guest suspended
SERIAL="/dev/nmdm_asterisk2A" # For "screen /dev/nmdm_asterisk2B" - pkg install screen
TAP="tap1"
CPU="8"
RAM="8G"
#For testing virtio-scsi
STORAGE="/dev/cam/ctl2.0" # port from /etc/ctl.conf(port ioctl/1) - core dumping on resume
DEVICE="virtio-scsi"
#for testing virtio-blk # Comment out above 2 lines if using these
#DEVICE="virtio-blk"
#STORAGE="/dev/zvol/zroot/asterisk2" # Standard zvol
#STORAGE="/dev/da2" # Block device created from iscsictl
#########################################
usage() {
echo "Usage: $1 start (Start the guest: $VMNAME)";
echo "Usage: $1 stop (Stop the guest: $VMNAME)";
echo "Usage: $1 resume (Resume the guest from last suspend: $VMNAME)";
echo "Usage: $1 suspend (Suspend the guest: $VMNAME)";
echo "Usage: $1 install (Install new guest: $VMNAME)";
exit
}
if [ ! -d "$DIR" ]; then
mkdir -p $DIR
fi
#if [ -z "$2" ]; then
# usage
#else
# VMNAME=$2
#fi
if [ "$1" == "install" ]; then
#Kill it before starting it
echo "Execute: screen $SERIAL"
bhyvectl --destroy --vm=$VMNAME
bhyve -c $CPU -m $RAM -w -H -A \
-s 0:0,hostbridge \
-s 3:0,ahci-cd,$ISO \
-s 4:0,$DEVICE,$STORAGE \
-s 5:0,virtio-net,$TAP \
-s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
-s 30,xhci,tablet \
-s 31,lpc -l com1,stdio \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
$VMNAME
#kill it after
bhyvectl --destroy --vm=$VMNAME
elif [ "$1" == "start" ]; then
while true
do
echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
#Kill it before starting it
bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
bhyve -c $CPU -m $RAM -w -H -A \
-s 0:0,hostbridge \
-s 4:0,$DEVICE,$STORAGE \
-s 5:0,virtio-net,$TAP \
-s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
-s 30,xhci,tablet \
-s 31,lpc -l com1,$SERIAL \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
$VMNAME
#DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
#if [ "$?" != 0 ];
#then
# echo "The exit code was not reboot code 0!: $?"
# exit
#fi
echo "The exit code was : $?"
exit
done
elif [ "$1" == "resume" ]; then
while true
do
echo "Starting $VMNAME -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT"
#Kill it before starting it
bhyvectl --destroy --vm=$VMNAME > /dev/null 2>&1
if [ -f "$DIR/default.ckp" ]; then
bhyve -c $CPU -m $RAM -w -H -A \
-s 0:0,hostbridge \
-s 4:0,$DEVICE,$STORAGE \
-s 5:0,virtio-net,$TAP \
-s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT \
-s 30,xhci,tablet \
-s 31,lpc -l com1,$SERIAL \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
-r $DIR/default.ckp \
$VMNAME
else
echo "Guest was never suspended"
exit
fi
#DISABLING REBOOT LOOP AS SUSPEND RETURNS ERROR CODE 0 AS WELL
#if [ "$?" != 0 ];
#then
# echo "The exit code was not reboot code 0!: $?"
# exit
#fi
echo "The exit code was : $?"
exit
done
elif [ "$1" == "suspend" ];
then
bhyvectl --suspend $DIR/default.ckp --vm=$VMNAME
elif [ "$1" == "stop" ]; then
bhyvectl --destroy --vm=$VMNAME
else
usage
fi
Great now let’s install the guests:
chmod 755 *.sh
./asterisk.sh install
(at this point just install FreeBSD with default UFS options, I actually chose to remove swap partition at the end, and the root partition then add them back as swap 2nd and root partition last to avoid headaches having to grow the disk later on, I suggest you do the same)
./asterisk2.sh install (install with GPT+UEFI)
(install FreeBSD again with default ZFS install, there is no headache here partitions are perfect here as defaults on zroot)
(At this point I may edit /etc/fstab to change whatever hardcoded device names
are in there to GPT names from /dev/gpt/* so when we switch between virtio-scsi and virtio-blk devices it won't matter. Ie for swap switch it to:
/dev/gpt/swap0 instead) If you are ever wanting to show what the GPT label names are you can either do "gdisk -l <device>" or "gpart show -l <device>" to figure out what to put into /etc/fstab. If they have no label , give them one. )
#now let's start them both after the install
./asterisk.sh start
#another terminal
./asterisk2.sh start
#another terminal
screen /dev/nmdm_asteriskB
#another terminal
screen /dev/nmdm_asterisk2B
#another terminal
./asterisk.sh suspend
./asterisk.sh resume
#another terminal
./asterisk2.sh suspend
./asterisk2.sh resume
Now you will notice on your screen session, after resuming guest “ls” etc works but as soon as we do anything else, “df -h” it will hang and after about a minute it will core dump.
router:/root # ./asterisk2.sh resume
Starting asterisk2 -s 29,fbuf,tcp=127.0.0.1:5901,w=800,h=600
fbuf frame buffer base: 0x229792600000 [sz 16777216]
Pausing pci devs...
pci_pause: no such name: virtio-blk
pci_pause: no such name: ahci
pci_pause: no such name: ahci-hd
pci_pause: no such name: ahci-cd
Restoring vm mem...
[8192.000MiB / 8192.000MiB] |################################################################################################################################################|
Restoring pci devs...
vm_restore_user_dev: Device size is 0. Assuming virtio-blk is not used
vm_restore_user_dev: Device size is 0. Assuming virtio-rnd is not used
vm_restore_user_dev: Device size is 0. Assuming e1000 is not used
vm_restore_user_dev: Device size is 0. Assuming ahci is not used
vm_restore_user_dev: Device size is 0. Assuming ahci-hd is not used
vm_restore_user_dev: Device size is 0. Assuming ahci-cd is not used
Restoring kernel structs...
Resuming pci devs...
pci_resume: no such name: virtio-blk
pci_resume: no such name: ahci
pci_resume: no such name: ahci-hd
pci_resume: no such name: ahci-cd
./asterisk2.sh: line 145: 10883 Segmentation fault (core dumped) bhyve -c $CPU -m $RAM -w -H -A -s 0:0,hostbridge -s 4:0,$DEVICE,$STORAGE -s 5:0,virtio-net,$TAP -s 29,fbuf,tcp=$HOST:$PORT,w=$WIDTH,h=$HEIGHT -s 30,xhci,tablet -s 31,lpc -l com1,$SERIAL -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -r $DIR/default.ckp $VMNAME
The exit code was : 139
router:/root #
Great so scsi is unstable with suspend/resume. Now run same tests but let’s modify asterisk2.sh script to use virtio-blk instead with /dev/da2 and run same tests:
#For testing virtio-scsi
#STORAGE="/dev/cam/ctl2.0" # port from /etc/ctl.conf(port ioctl/1) - core dumping on resume
#DEVICE="virtio-scsi"
#for testing virtio-blk # Comment out above 2 lines if using these
DEVICE="virtio-blk"
#STORAGE="/dev/zvol/zroot/asterisk2" # Standard zvol
STORAGE="/dev/da2" # Block device created from iscsictl
You may be wondering how I know which /dev/da* asterisk2 is using:
I simply run:
iscsictl
#This will give you a list of who is on what, it can be completely random
#always check this list before mounting a guest so you don't mount wrong one
#typically everything on lun0 will be numbered first, then lun1 but this is not #always the case so make sure to run that
#on a non-testing host it may look something like this:
router:/root # iscsictl
Target name Target portal State
iqn.com.sunsaturn.asterisk:target1 127.0.0.1:3260 Connected: da1 da5
iqn.com.sunsaturn.rocky:target2 127.0.0.1:3260 Connected: da2 da6
iqn.com.sunsaturn.ubuntu:target3 127.0.0.1:3260 Connected: da3 da8
iqn.com.sunsaturn.windows:target4 127.0.0.1:3260 Connected: da4 da7
router:/root #
Here is something good to add to your .bash_profile then you don't have to
think about it ever again:
if [ "$HOSTNAME" == "test.test.com" ]; then
echo "Checking which devices guests connected to:"
echo "#######################################################"
iscsictl
echo "#######################################################"
fi
I personally have a 2nd root account I set to bash so I don't touch default root
shell, can use toor if you like. I suggest everyone do that, there will come a day where you are like damn I can't remember my root password anymore because been using ssh keys so long. Yes you have to su to root everytime, but I even automate that these days.
Now let’s run it:
./asterisk2.sh start
#another terminal (make sure watching screen terminal running these)
./asterisk2.sh suspend
./asterisk2.sh resume
WORKS PERFECTLY:
Now go uncomment STORAGE line with
STORAGE="/dev/zvol/zroot/asterisk2"
and comment out:
#STORAGE="/dev/da2"
WORKS PERFECTLY TO
So what can we see from suspend/resume stability, it works on virtio-blk perfectly. No matter if I use the /dev/da* devices from scsi or the zvols directly.
What about importing ZFS pool from asterisk2? Sure let’s do it on host, shut it down first:
router:/root # gpart show /dev/da2
=> 40 62914480 da2 GPT (30G)
40 532480 1 efi (260M)
532520 2008 - free - (1.0M)
534528 16777216 2 freebsd-swap (8.0G)
17311744 45600768 3 freebsd-zfs (22G)
62912512 2008 - free - (1.0M)
router:/root # gdisk -l /dev/da2
GPT fdisk (gdisk) version 1.0.9
Partition table scan:
MBR: protective
BSD: not present
APM: not present
GPT: present
Found valid GPT with protective MBR; using GPT.
Disk /dev/da2: 62914560 sectors, 30.0 GiB
Sector size (logical): 512 bytes
Disk identifier (GUID): 8ACD2112-5EBF-11ED-8F56-00A098E3C14E
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 40, last usable sector is 62914519
Partitions will be aligned on 8-sector boundaries
Total free space is 4016 sectors (2.0 MiB)
Number Start (sector) End (sector) Size Code Name
1 40 532519 260.0 MiB EF00 efiboot0
2 534528 17311743 8.0 GiB A502 swap0
3 17311744 62912511 21.7 GiB A504 zfs0
router:/root #
So we can see this is definitely our ZFS guest, and we already know we cannot manipulate /dev/zvol/zroot/asterisk2 directly because of recursion issues, but we can manipulate /dev/da2 through scsi just fine:
#Import ZFS guest under /mnt (shut down the guest)
#First import it under /mnt under a different temporary name that won't save the name when we export it after with "-t" option
zpool import (we should see asterisk2's zroot ready for import)
zpool import -fR /mnt -t zroot testing
zfs mount -o mountpoint=/mnt testing/ROOT/default
(go ahead fix /mnt/etc /mnt/boot problems)
zpool export testing
rm -rf /mnt/* (cleanup left over directories)
#hell let's mount asterisk guest with UFS as well on /mnt
#I did crash it a few times so probably need recovery
router:/root # gpart show /dev/da1
=> 40 104857520 da1 GPT (50G)
40 24 - free - (12K)
64 532480 1 efi (260M)
532544 16777216 2 freebsd-swap (8.0G)
17309760 87547776 3 freebsd-ufs (42G)
104857536 24 - free - (12K)
router:/root # mount -t ufs /dev/da1p3 /mnt
mount: /dev/da1p3: R/W mount of / denied. Filesystem is not clean - run fsck. Forced mount will invalidate journal contents: Operation not permitted
router:/root # fsck /dev/da1p3
** /dev/da1p3
** SU+J Recovering /dev/da1p3
USE JOURNAL? [yn] y
** Reading 182419456 byte journal from inode 4.
RECOVER? [yn] y
** Building recovery table.
** Resolving unreferenced inode list.
** Processing journal entries.
WRITE CHANGES? [yn] y
***** FILE SYSTEM IS CLEAN *****
** 96 journal records in 6656 bytes for 46.15% utilization
** Freed 27 inodes (4 dirs) 0 blocks, and 20 frags.
***** FILE SYSTEM MARKED CLEAN *****
router:/root # mount -t ufs /dev/da1p3 /mnt
router:/root # ls /mnt
bin boot COPYRIGHT dev entropy etc home lib libexec media mnt net proc rescue root sbin sys tmp usr var
router:/root #
Summary:
Suspend/resume does work, just not passing in virtio-scsi with those /dev/cam/* devices. Seems we can do “ls” just fine, so something else is going on.
For now what you can do is just use the virtio-blk devices, until virtio-scsi works. In fact I would set it up this way since we already know virtio-blk is on its way out, plus it was so much nicer to pass in a CDROM on asterisk2 in just 1 line to guest, or as many disks as we want. We learned we can use /dev/da* devices for direct ZFS importing of guests pools for disaster recovery, for anything that isn’t ZFS we could also use the /dev/zvol/zroot/* devices directly if we wanted, but it is a cool work around from FreeBSD team, and sets you up using virtio-scsi now.
If you do not care about suspend/resume right now you could just continue using virtio-scsi to future proof yourself for virtio-blk phase out, or you could just pass virtio-blk devices temporarily till it is fixed and use suspend/resume all you want. My personal preference till this gets sorted out is leave iscsi processes running till it gets resolved, and use virtio-blk directly against /dev/zvol/root/<guestname> for now, can switch them later, this way you can use suspend/resume all you want as well as mount any ZFS guests through the /dev/da* devices anytime you need. This way your at least future proofed. If you don’t want to use the /dev/da* devices :
zfs set volmode=full zroot/asterisk
(if you want to be able to mount non-ZFS guests directly at:
/dev/zvol/zroot/asterisk for instance, what this will do is separate asterisk into asteriskp1 asteriskp2 asteriskp3 etc for your mounting purposes)
#Just get used to scsi already :) I know old habits die hard and no one wants to #give up NIS either and learn LDAP, hell neither do I, whatever works :)
Hope you enjoyed this article on suspend/resume experimental support for FreeBSD, perhaps in another article I will show you Linux and Windows guests, and how we can disaster recovery them as well like we did for these 2 UFS and ZFS guests today, till then….
Shine bright like a diamond,
Dan.