VMware image backup with Bareos – More free backup

Bareos (Bacula if you like) does a great job of backing up files. In the event of a total meltdown I really would prefer the ability to restore an entire VM as opposed to rebuilding and installing agents prior to restore. Let’s see if I can make this work.

Brainstorming:

In the grand scheme, the server to be backed up will be localhost. The files will exist on an NFS volume accessible to both the VMware host VMkernel and localhost.

We will take a snapshot of the running VM, then copy the VMDK out to that NFS location using a run-before script. We will be able to put it in location predictable to Bareos and use the appropriate fileset definition to go out and grab that set of files for each job/vm. We will the use a run-after script to delete the snapshot and the backed up files out on that NFS.

To test how realistic this is at all I’m going to use a “junk” vm to copy a snapshotted VMDK and associated vmx file and try to see if I can get that portion up and running.

To create the snapshot in the busybox console:

vim-cmd vmsvc/snapshot.create 17 "bareos_backup" "Temporary snapshot for Backup system. This should not exist if a backup isn't currently running."

The ’17’ in that command references a vmid. That will have to be parsed using the command:

vim-cmd vmsvc/getallvms

To be dealt with as I script it out.

I started the copy of my 40GB vmdk at 1:29PM…

off for coffee…

Done by 1:54PM, possibly sooner but I wasn’t looking. Now I’ll copy the vmx file and see if I can mangle it enough to make the thing boot.

— next morning —

The bad news is that I couldn’t get the copied disk to work easily. A bit of research learned me that I should have used vmkfstools to copy the snapshotted file, so I tried again that way. Here was my command:

vmkfstools -i
 source.vmdk /vmfs/volumes/dst_datastore/restoretest/restoretest.vmdk -d thin

After running that command and also copying the vmx file, I imported the vmx in the new location, removed the existing disk and added a new disk using the newly relocated vmdk – it booted. Another bonus came from using vmkfstools instead of cp, that being I was able to specify to create a thin disk on the destination end. This cut the copy time down to about 4:32 and I have a smaller file to backup. Now that I know the whole process is relatively possible, I’ll do the pre and post-job scripts in Python.

— next evening —

I spent the entire day creating the before backup job and am right now running my first end to end trial. The Bareos definitions read like this:

JobDefs {
  Name = "VM"
  Type = Backup
  Level = Full
  FileSet = "VM Image Backup NFS Folder"
  Storage = File
  Messages = Standard
  Priority = 10
  Pool = VMImage
}
Job {
  Name = "vmguest1-FullImage"
  JobDefs = "VM"
  Client = bacula-srv-fd
  Schedule = "Monthly-VMImage-vmguest1"
  RunBeforeJob = "/usr/lib/bareos/scripts/vmprep.py -v vmguest1.gsellc.local"
}
FileSet {
  Name = "VM Image Backup NFS Folder"
  Include {
    Options {
    signature = MD5
    }
  File = "/mnt/vmbackup"
  }
}

/mnt/vmbackup is an NFS mounted directory that both my ESXi hosts and my Bareos director can access. It’s the handoff point, ESX copies the VMDKs there, then Bareos picks them up and stuffs them onto backup media. The before-backup script identifies the VM we want to use, takes a snapshot then copies it to the staging location.

Unfortunately it would seem that Bareos likes to backup sparse files, not disk blocks. This means that while my test VM uses about 35 GB on disk, Bareos is transferring 160 (compressed) GB to tape, so the backup will take awhile. At the end of the day it takes the same amount of space on tape, it just increases the backup window.

I have yet to write the cleanup job that will delete the files, this is an important component and will be what I do next. As it stands, I have something that kind of works to polish and shine into something totally usable. The other big ToDo is I want to leave traces of what the backup is in the backup. Meaning I want to add a backup logfile that can be used at restore time to see what the guest’s name was, what ESX host it lived on, where it kept its VMDKs and all that. All of the information is already stored in the before job script, it just needs to be put together in a pretty file and left in the staging directory. I also am considering adding options for quiescing, but that is low on my priority list.

My first backup on my 160 GB test machine took just about 2 hours – a little more. It looks like in my environment my backups are going to take about 45-50 for each GB of ALLOCATED disk. I can tolerate this as I only plan on backing up whole VM images once a month or so, maybe once a week for VERY dynamic machines or machines that are less about data and more about application. I will not be relying on this as a substitute for traditional agent based backups.

I think that’s enough of a knowledge dump on this topic for 1 post. More to come.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>