In this ZFS tutorial series,Here we are going to see about new features of ZFS and it’s usage in real production environment. Oracle is continuously developing ZFS and they brought some of the nice features in to Solaris 11 ZFS . Here we are mainly focusing on ZFS deduplication and ZFS encryption mechanisms. ZFS Deduplication is the process of eliminating duplicate copies of data within the ZFS datasets aka filesystems. For an example,if you copy a file with the size of 1G twice in ZFS dataset, the dataset will consume only 1GB not 2GB for those two files.That’s how the ZFS dedeuplication works. ZFS encryption option will be used to encrypt the data on the zpool for security purpose.
Here is the zpool details which will be used to test ZFS deduplication facility.
root@Unixarena-SOL11:~# zpool list unixarena
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
unixarena 1.98G 34K 1.97G 1% 1.00x ONLINE -
root@Unixarena-SOL11:~#
Let me verify the deduplication property on that zpool.
root@Unixarena-SOL11:~# zfs get dedup unixarena
NAME PROPERTY VALUE SOURCE
unixarena dedup off local
root@Unixarena-SOL11:~#
As per the above command output,ZFS deduplication property has been set off now.
List the zpool mount point here.
root@Unixarena-SOL11:~# df -h /unixarena
Filesystem Size Used Available Capacity Mounted on
unixarena 2.0G 34K 2.0G 1% /unixarena
root@Unixarena-SOL11:~#
Let me try to copy some files to this zpool and see how its growing .
root@Unixarena-SOL11:/# cp /root/VRTSpkgs.p5p /unixarena/
root@Unixarena-SOL11:~# df -h /unixarena/
Filesystem Size Used Available Capacity Mounted on
unixarena 2.0G 346M 1.6G 18% /unixarena
root@Unixarena-SOL11:~# cp /root/VRTSpkgs.p5p /unixarena/VRTSpkgs.p5p.dedup
root@Unixarena-SOL11:~# df -h /unixarena/
Filesystem Size Used Available Capacity Mounted on
unixarena 2.0G 692M 1.6G 18% /unixarena
root@Unixarena-SOL11:/unixarena# zpool list unixarena
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
unixarena 1.98G 692M 1.34G 33% 1.00x ONLINE -
This is what happens when you don;t set deduplication property to the zpool. I just copied VRTSpkgs.p5p twice with different name.But it took zpool space 2x346MB=692MB.
Let see what happens if we set deduplication property to zpool.
Here i set the deduplication property to zpool “unixarena”
root@Unixarena-SOL11:/unixarena# zfs set dedup=on unixarena
root@Unixarena-SOL11:/unixarena# zfs get dedup unixarena
NAME PROPERTY VALUE SOURCE
unixarena dedup on local
root@Unixarena-SOL11:/unixarena#
Let me try to copy the same file again.I just reminding you ,now we have deduplication property has been set.
root@Unixarena-SOL11:/# cp /root/VRTSpkgs.p5p /unixarena/
root@Unixarena-SOL11:~# df -h /unixarena/
Filesystem Size Used Available Capacity Mounted on
unixarena 2.0G 346M 1.6G 18% /unixarena
root@Unixarena-SOL11:~#
root@Unixarena-SOL11:/unixarena# zpool list unixarena
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
unixarena 1.98G 347M 1.65G 17% 1.00x ONLINE -
root@Unixarena-SOL11:/unixarena#
root@Unixarena-SOL11:/unixarena# ls -lrt
total 708883
-rw-r--r-- 1 root root 362639360 Jul 22 03:18 VRTSpkgs.p5p
root@Unixarena-SOL11:/unixarena# cp VRTSpkgs.p5p VRTSpkgs.p5p.test.dedup
root@Unixarena-SOL11:/unixarena# ls -lrt
total 1417766
-rw-r--r-- 1 root root 362639360 Jul 22 03:18 VRTSpkgs.p5p
-rw-r--r-- 1 root root 362639360 Jul 22 03:25 VRTSpkgs.p5p.test.dedup
root@Unixarena-SOL11:/unixarena# df -h .
Filesystem Size Used Available Capacity Mounted on
unixarena 2.3G 692M 1.6G 30% /unixarena
root@Unixarena-SOL11:/unixarena# zpool list unixarena
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
unixarena 1.98G 348M 1.64G 17% 2.00x ONLINE -
root@Unixarena-SOL11:/unixarena#
From the above results, “df -h” is showing actual data size.But zpool removed the duplicate data and showing 2.00x under the “DEDUP” property.
Let me try to copy the same file again and again.
root@Unixarena-SOL11:/unixarena# cp VRTSpkgs.p5p VRTSpkgs.p5p.test.dedup.3
root@Unixarena-SOL11:/unixarena# cp VRTSpkgs.p5p VRTSpkgs.p5p.test.dedup.4
root@Unixarena-SOL11:/unixarena# cp VRTSpkgs.p5p VRTSpkgs.p5p.test.dedup.5
root@Unixarena-SOL11:/unixarena# ls -lrt
total 4962181
-rw-r--r-- 1 root root 362639360 Jul 22 03:18 VRTSpkgs.p5p
-rw-r--r-- 1 root root 362639360 Jul 22 03:25 VRTSpkgs.p5p.test.dedup
-rw-r--r-- 1 root root 362639360 Jul 22 03:28 VRTSpkgs.p5p.test.dedup.1
-rw-r--r-- 1 root root 362639360 Jul 22 03:29 VRTSpkgs.p5p.test.dedup.2
-rw-r--r-- 1 root root 362639360 Jul 22 03:31 VRTSpkgs.p5p.test.dedup.3
-rw-r--r-- 1 root root 362639360 Jul 22 03:31 VRTSpkgs.p5p.test.dedup.4
-rw-r--r-- 1 root root 362639360 Jul 22 03:32 VRTSpkgs.p5p.test.dedup.5
root@Unixarena-SOL11:/unixarena# df -h .
Filesystem Size Used Available Capacity Mounted on
unixarena 3.9G 2.4G 1.6G 60% /unixarena
root@Unixarena-SOL11:/unixarena# zpool list unixarena
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
unixarena 1.98G 349M 1.64G 17% 7.00x ONLINE -
root@Unixarena-SOL11:/unixarena#
Actual zpool size is 2GB but “df -h” shows 3.9GB due to the deduplication feature.
You can see the deduplication value went to 7.00x.
What is does actually in the background,duplicate data is deleted, leaving only one copy of the data to be stored.But indexing of all the data is still retained.This feature will definitely reduce the storage capacity since it stores only the unique data in the zpool.
ZFS deduplication uses mathematical claim that a secure hash like SHA256.But this will not do full end to end data comparison when the new data copied to zpool. You can trust hash mathematical claim but still money related transactions data ,we need a full comparison of data blocks to ensure not loosing anything.
To perform the end to end data comparison,just enable deduplication using “verify” parameter.
root@Unixarena-SOL11:/unixarena# zfs set dedup=verify unixarena
root@Unixarena-SOL11:/unixarena# zfs get dedup unixarena
NAME PROPERTY VALUE SOURCE
unixarena dedup verify local
Anytime you can back to the normal hash deduplication parameter by simply setting the “on” option.
root@Unixarena-SOL11:/unixarena# zfs get dedup unixarena
NAME PROPERTY VALUE SOURCE
unixarena dedup verify local
root@Unixarena-SOL11:/unixarena# zfs set dedup=on unixarena
root@Unixarena-SOL11:/unixarena# zfs get dedup unixarena
NAME PROPERTY VALUE SOURCE
unixarena dedup on local
root@Unixarena-SOL11:/unixarena#
To learn more about the zfs deduplication check out here.
To set encryption to zfs dataset,use the below command when you are creating the dataset.
# zfs create -o encryption=on dataset_name
Thank you for reading this article.
Leave a Reply