We know that the amount of data we store is exploding. Not
only are we collecting more detailed information about specific things (e.g.
servers, digital images, etc.), but we are also collecting information about
more things (e.g. refrigerators, automobiles, etc.). We are quite literally
experiencing the “Internet of Things” where uniquely identifiable objects are
connected through modern networking technology to provide us with an abundance
of information.
Wouldn’t it be nice to have an easy way to condense and to manage this wealth of information? Enter stage right the technology known as data deduplication. Quite simply, this is a technology which reduces duplicate information into a set of unique data patterns. Please see my other post on deduplication for more details.
Wouldn’t it be nice to have an easy way to condense and to manage this wealth of information? Enter stage right the technology known as data deduplication. Quite simply, this is a technology which reduces duplicate information into a set of unique data patterns. Please see my other post on deduplication for more details.
In an oversimplified view of the deduplication process,
every new data pattern read by a file system can be fingerprinted with a unique
hash and that fingerprint can be compared with an index of previously recorded
data patterns and their associated fingerprints. This process of reading data
patterns, fingerprinting them, comparing them with existing patterns, and then
storing unique patterns or creating references for non-unique data patterns
requires computational resources. This is also true of the reverse process when
data is reassembled for use. These computational resources may not be trivial.
In the case of source deduplication the client system can
experience increases in processor and/or memory load up to 20%. This can be
significant in a virtual environment where several clients share host
resources—especially if each client sees performance degradation at the same
time. Additionally there may be a slight delay in data read/write times due to
this added processing. This implies that deduplication may be better suited for
large collections of data that does not change often and does not require rapid
access. It also implies that the deduplication process may be better
implemented as a process at the destination rather than at the source of read/write
processes.
Just how clean is your data? Identify where your data requires attention, allowing you to choose which areas to improve.
Another issue to consider is that deduplication relies on
duplicate data patterns. Technologies
like encryption—which works to remove recognizable patterns within a
dataset—may affect and may even be incompatible with deduplication processes.
Understanding how data deduplication interacts with data security is paramount
to effectively storing computer data.Get Free Email Append Test from AverickMedia
Article From: http://www.storagecraft.com/