Windows Server 2012, Data Deduplication savings for TB sized drives (#WinServ, #HyperV)

One of the features I was recently excited about with the release of Windows Server 2012 is Data Deduplication. I decided to kick the tires with this in my lab environment for my Windows Server 2012 system which is both a Hyper-V servers and provides my Virtual Machine Manager library. The screenshot below shows my three drive configuration which has a C drive for the OS, V drive for virtuals, and a Y drive for software media and templates. The goal is to enable data deduplication on the V and Y drives for this server.

image

Adding the Data Deduplication functionality in Windows Server 2012

Let’s add the data deduplication feature: http://blog.powerbiz.net.au/features/data-deduplication-in-windows-server-2012/. Add the data deduplication part of the File and Storage Services Server Role:

image

Installation documentation is available at http://technet.microsoft.com/en-us/library/hh831700.aspx, or http://technet.microsoft.com/en-us/library/hh831434.aspx or http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx.

Determining benefit to data deduplicating a drive:

This feature provides a program called DDPEVAL.EXE (c:\windows\system32) which can assess estimated disk file space savings from using data deduplication. The example below shows the results when it was run for my 3.63 TB Y drive shown at the top of this blog post. On a large volume like this drive the process may take a significant amount of time to perform. For my lab environment with relatively slow drives this took about a couple of hours to run for my 3.63 TB volume. Results shown below: (45% estimated space savings percentage with no compression is impressive!)

image

Activating Data Deduplication on a volume:

To configure data deduplication on a volume, open the server manager, File and Storage Services, Volumes.

image

Right-click on the volume that you want to configure this for and choose the “Configure Data Deduplication” option.

image

Enable data duplication, and configure how old files need to be before they can be deduplicated.

image

File extensions can be excluded, and a schedule can be configured for when the deduplicate this volume as shown below.

image

Note: Per (http://technet.microsoft.com/en-us/library/hh831700.aspx) if you enable throughput optimization the system will use up to 50% of the system’s memory for the optimization job (which would probably not be a good idea on a highly memory utilized Hyper-V server as an example).

Once data deduplication has been activated for the volume the fields are added the deduplication rate and savings fields are now populated as shown below:

image

If a volume is not supported for data deduplication the option is grayed out as shown below where the C drive is not allowed (system or boot volume):

image

What does it run?

Using resource monitor we can see the ddpeval program and what files it’s accessing while it’s assessing the benefits to data deduplicating a volume.

image

Where can’t data deduplication be used?

What we can’t use dedup on: (subset from http://technet.microsoft.com/en-us/library/hh831700.aspx)

  • Must not be a system or boot volume. Deduplication is not supported on operating system volumes.
  • Can be partitioned as a master boot record (MBR) or a GUID Partition Table (GPT), and must be formatted using the NTFS file system.
  • Can reside on shared storage, such as storage that uses a Fibre Channel or an SAS array, or when an iSCSI SAN and Windows Failover Clustering is fully supported.
  • Do not rely on Cluster Shared Volumes (CSVs). You can access data if a deduplication-enabled volume is converted to a CSV, but you cannot continue to process files for deduplication.
  • Do not rely on the Microsoft Resilient File System (ReFS).
  • Must be exposed to the operating system as non-removable drives. Remotely-mapped drives are not supported.

    What are the expected disk space savings?

    The following is a great sample what we should expect to save in disk space from http://blogs.technet.com/b/filecab/archive/2012/05/21/introduction-to-data-deduplication-in-windows-server-2012.aspx.

    clip_image003

    So what about result in my lab environment?

    Let’s see what it says will be gained on my Y drive which stores my software media.

    Results one day later:

    After one day (and re-opening Server Manager), the results were as shown below:
    image

    Testing my V drive – 932 GB with 555 GB free – now has 626 GB free (71 GB additional space).

    Testing my Y drive – 3.63 TB with 831 GB free – now has 919 GB free (88 GB additional space).

    image

    Results four days later:

    After after four days (and re-opening Server Manager), the results were as shown below:

    image

    Testing my V drive – 932 GB with 555 GB free – now has 634 GB free (93.3 GB additional space).

    Testing my Y drive – 3.63 TB with 831 GB free – now has 1.08 TB free (285 GB additional space).

    image

    Results one week later:

    After one week (and re-opening Server Manager), the results were as shown below:
    image

    Testing my V drive – 932 GB with 555 GB free – now has 680 GB free (143 GB additional space).

    Testing my Y drive – 3.63 TB with 831 GB free – now has 2.42 TB free (1.63 TB additional space).

    image

    Results after the leaving the system online with deduplication configured:

    Several months later I returned to this blog post and found the following results:
     image

    Testing my V drive – showed that it had a 43% deduplication rate and had gained 417 GB of additional space through deduplication.

    Testing my Y drive – showed that it had a 69% deduplication rate and gained 1.69 TB of additional space through deduplication!

    Or before and after screenshots shown below:

    Before:

    image

    After:

    image

    Summary:

  • I am extremely impressed with the disk savings seen in my Hyper-V servers in my lab. I have since activated this on all of my Hyper-V servers for the drives where I store my virtuals.

  • There are several types of drives (including the system or boot volume) that dedup cannot be used on.

  • It can take some significant amounts of time to complete the full deduplication process and there needs to be sufficient disk space and memory available on the host to effectively deduplicate the volume.

    Resources and links:

  • Additional recommended links:

  • One thought on “Windows Server 2012, Data Deduplication savings for TB sized drives (#WinServ, #HyperV)

    1. Pingback: Windows 2012 Deduplication and Hyper-V

    Leave a Reply