Tazilkree Data DeDuplication for — Documents Eliminating extra copies of data saves money not only on direct disk hardware costs, but also on related costs, like electricity, cooling, maintenance, floor space, etc. We are nonprofit website to share and download documents. For example, if deduplication technology is included in a backup appliance or storage solution, the implementation process will be much different than for standalone deduplication software. Can You Afford Not To? For general information on our other products and services, please contact our Business Development Department in the U.
|Published (Last):||7 March 2011|
|PDF File Size:||19.22 Mb|
|ePub File Size:||8.14 Mb|
|Price:||Free* [*Free Regsitration Required]|
By Doug Lowe Beginning with Windows Server , Microsoft has included an innovative technology called data deduplication, which can dramatically reduce the amount of actual disk space required to store your data.
Depending on the type of data, you can expect to save anywhere from 20 percent to more than 80 percent. At 20 percent savings, 10TB of data consumes only 8TB of disk storage. At 80 percent savings, 10TB consumes just 2TB. Data deduplication works by finding portions of files that are identical and storing just a single copy of the duplicated data on the disk. The technology required to find and isolate duplicated portions of files on a large disk is pretty complicated.
Microsoft uses an algorithm called chunking, which scans data on the disk and breaks it into chunks whose average size is 64KB. These chunks are stored on disk in a hidden folder called the chunk store. Then, the actual files on the disk contain pointers to individual chunks in the chunk store. If two or more files contain identical chunks, only a single copy of the chunk is placed in the chunk store and the files that share the chunk all point to the same chunk.
Microsoft has tuned the chunking algorithm sufficiently that in most cases, users will have no idea that their data has been deduplicated. Access to the data is as fast as if the data were not deduplicated. For performance reasons, data is not automatically deduplicated as it is written. Instead, regularly scheduled deduplication jobs scan the disk, applying the chunking algorithm to find chunks that can be deduplicated.
To use data deduplication, you must first enable the data deduplication feature in Server Manager. To configure data deduplication, open Server Manager, choose File and Storage Services, click Volumes, right-click the volume that you want to deduplicate, and then choose Configure Data Deduplication. The Deduplication Settings page appears. From this page, you can enable data deduplication, exclude certain file types, and set a schedule for the deduplication jobs to run.
Once deduplication is set up, give the deduplication job time to run.
Install and enable Data Deduplication
Data Deduplication in Windows Server
Benefits[ edit ] Storage-based data deduplication reduces the amount of storage needed for a given set of files. It is most effective in applications where many copies of very similar or even identical data are stored on a single disk—a surprisingly common scenario. In the case of data backups, which routinely are performed to protect against data loss, most data in a given backup remain unchanged from the previous backup. Neither approach captures all redundancies, however. Hard-linking does not help with large files that have only changed in small ways, such as an email database; differences only find redundancies in adjacent versions of a single file consider a section that was deleted and later added in again, or a logo image included in many documents. In-line network data deduplication is used to reduce the number of bytes that must be transferred between endpoints, which can reduce the amount of bandwidth required. See WAN optimization for more information.
In-line deduplication for a smaller data footprint
Dedupe and compression are must-haves for primary storage too. Deduplication is a compression technique that minimizes data by identifying repetitive patterns, removing the duplicates and leaving a pointer to the stored copy in their place. This pointer is created from a hash of the data pattern of a given size. Not all dedupe is created equal. IT professionals should weigh their options before adding it to their primary storage environment. When to dedupe Dedupe data when it is first created.