A while back i wrote a post about data deduplication in 2012…. generally a very good feature, but as that specific post talked about, there was a collection of .iso and compressed data where not only did dedup not save me anything, it actually used up more space in the dedup folder than the orginal data size (which i found a little odd)
Today i got around to doing something about this and found
- Disabling data deduplication (via GUI or powershell) only stops further deduplication from occuring – but data that has already been deduplicated will remain deduplicated
- In order to “move” (re-hydrate ?) the data back to the original files and out of the deduplication store, use the powershell command start-dedupjob -Volume <VolumeLetter> -Type Unoptimization
- You can check the status on where this is at by using get-dedupjob, or, i like using TreeSize which shows the size on disk of specific files…. including the deduplication chunks
- At this stage – i noticed the original files getting bigger, but the dedup store (and the chunks within it) have not decreased at all…. “maybe theres another command for this ?” i thought….
- There were two additional job types available, “garbageCollection” and “scrubbing”. Unfortunately the powershell help nor the technet documentation actually state what either of these do! After a bit of searching, i found this page http://www.infotechguyz.com/WindowsServer2012/DedupandWindowsServer2012.html which specifies that GarbageCollection will find and remove unreferenced chunks and scrubbing will perform an integrity check…. so with this knowledge i then ran
- start-dedupjob -Volume <VolumeLetter> -Type GarbageCollection only to find that this command can only be run when dedup is enabled!
- In order to get around this, i re-enabled dedup, but excluded all folders on the drive, i also removed all the schedules/background optimisation settings…. then re-ran the command
- Ininitally the size of the dedup folder increased by approx 100mb (keep in mind the dedup folder a thtis stage was 2.2TB), but soon the get-dedupjob status seemed to stop at 50% and the size of the dedup folder started coming down, quite quickly in 1GB chunks (the chunks seem to be a max of 1GB)
- Once this completed (it took a while) – i disabled dedup again and all was good
Just to be clear, 2012 deduplication is still a good technology – and i use it elsewhere with great results – just every now and again, you will run into a dataset which it just does not agree with…. and disbaling it completely just isn’t intuitive…. (and yes, all this probably could have been avoided by running the dedup estimator tool – but then i wouldnt have learnt stuff – so theres no fun in that!) hence why I thought it would write the above…. hope it helps someone.