Custom Search

Data Deduplication Downsizing

Data deduplication algorithm is designed to scan all information stored in any specific file sharing network in order to locate duplication of identical data, to eliminate that redundancy, and eventually maintain a single copy of the data. End-users may not visibly witness the actual process but this is surely a major solution to disk space issues. System scan is performed to check for redundancy. In case duplicates are found, they are eliminated accordingly by substituting them with only references to the single copy retained. As such, the network size is reduced thereby improving performance and speed.

Consider an internal e-mail network. There is a high probability of keeping multiple copies of an email sent through the network even if it is done by only one person. It is possible that the email is stored even on each person’s individual inbox. If the entire network had to be backed up, it would back up multiple copies of the same e-mail. How little the space this may seem to take up could actually be extremely big when talking of thousand of emails being sent. At this point data deduplication will help by eliminating the multiple copies and simply referencing them to just a single copy that will be maintained in the backup. An appropriately aimed data deduplication algorithm will allow big savings to both time and money when backing up.Perhaps the simplest way to have this done is through having someone make new data and scanning the system after a while to see if it is clear from duplicate entries. Advantageously, creating a new document is very easy and simple. On the other hand, it is unfortunate that the amount of space being used by such data can never be precisely measured. Only when the algorithm is performed daily can you be sure that you are regularly updated.

Checking the system for duplications every time someone enters new data is another option. Obviously, this can be very time-consuming. The reason is because hash calculations are performed by data deduplication systems in order to determine if a data has another existing match or copy. Creation of data and the system check will take up more time considering hash calculations themselves entail significant time to run. Both options are acceptable even though there are varying views among computer professionals when asked which one is better for them.

Comes with the popularity of this algorithm are criticisms. The hash calculation process takes part of these criticisms. Accordingly, although there is a very little probability of coming up with the same hash calculation for two different data, the chance is not absolutely zero. As such, data deduplication need not compare data bit-by-bit and even if the same has calculations are generated for two data, data corruption could be a factor. This also is not possible for networks that rely on redundancy. A very good example of this is the internet as well as a multi-level networks such as corporations and government agencies.

When there are multiple users within a network, there is a greater chance of producing redundant copies of the same data. This can quickly grow out of control. A program is usually deployed by many IT experts to remove such duplication and maintain only a single copy of necessary data. This is called data deduplication, and it is a must especially for large networks.

By: Joe Hammerstien

Article Directory: http://www.articledashboard.com

Discover which Data Deduplication service has the "Must Have" features. Visit www.druva.com and find out what features are needed before you purchase any backup program. Try Druva inSync for "Free" today!

© 2005-2011 Article Dashboard