Copying sparse disk image files to LVM thin volumes

Recently I had to migrate some raw disk images to LVM volumes. These disk images belonged to KVM virtual machines, which didn’t really need the whole disk space they were assigned upon creation. The images were sparse files, which allowed me to keep the actual disk space used to the bare minimum. I had to use LVM in the target infrastructure, but didn’t want to statically assign disk space to each VM as a lot of disk space would be wasted. LVM thin volumes were a perfect solution for my problem.

With LVM thin volumes, you create a single “parent” volume and then “child” volumes inside of it that will only take up actual space in the “parent” volume once they are written to. So you can create a 100G “parent” volume and twenty 10G “child” volumes, as long as the total amount of data written to all “child” volumes doesn’t exceed 100G (a little less, actually, due to some space being reserved for metadata in the “parent” volume).

The problem was: how do I copy only the actual contents of a sparse file (non-zero bytes, or at least non-zero blocks) to a block device? cp --sparse=always raw.img /dev/lvm/volume won’t work as GNU cp documentation clearly states it only works on regular files, not block devices. dd if=raw.img of=/dev/lvm/volume would simply copy all data from the source, including the holes (zero bytes).

As this is quite a simple problem all in all, I was surprised I wasn’t able to find a quick solution for it. I ended up writing a small C program which processes the source image in blocks, checks each block if it contains anything else than zeros and if it does, copies data to the destination. It’s definitely not a masterpiece of software engineering and has its limitations (raw image size must be a multiple of “zero block” size), but it worked fine for me, so maybe it’ll help you as well. The program is WTFPL-licensed.

DISCLAIMER: USE AT YOUR OWN RISK. This program may bring ancient curses on you, fry your computer, kill kittens and/or cause space-time discontinuities. It worked for me. YMMV.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s