If you work in IT and are responsible for backing up or
transferring large amounts of data, you’ve probably heard the term data
deduplication. Here’s a clear definition of what “data duplication” means, and
why it is a fundamental requirement in moving data to the cloud.
First, the basics
At its simplest definition, data deduplication refers to a
technique for eliminating redundant data in a data set. In the process of
deduplication, extra copies of the same data are deleted, leaving only one copy
to be stored. Data is analyzed to identify duplicate byte patterns to ensure
the single instance is indeed the single file. Then, duplicates are replaced
with a reference that points to the stored chunk.
Given that the same byte pattern may occur dozens, hundreds,
or even thousands of times — think about the number of times you make only
small changes to a PowerPoint file or share another important business asset —
the amount of duplicate data can be significant. In some companies, 80% of
corporate data is duplicated across the organization. Reducing the amount of
data to transmit across the network can save significant money in terms of
storage costs and backup speed — in some cases, savings up to 90%.
A real-world example
Consider an email server that contains 100 instances of the
same 1 MB file attachment, say, a sales presentation with graphics that was
sent to everyone on the global sales staff. Without data duplication, if
everyone backs up his email inbox, all 100 instances of the presentation are
saved, requiring 100 MB storage space. With data deduplication, only one
instance of the attachment is actually stored; each subsequent instance is just
referenced back to the one saved copy, reducing storage and bandwidth demand to
only 1 MB.
Data deduplication evolves to meet the need for speed
While data deduplication is a common concept, not all
deduplication techniques are the same. Early breakthroughs in data
deduplication were designed for the challenge of the time: reducing storage
capacity required and bringing more reliability to data backup to servers and
tape. One example is Quantum’s use of file-based or fixed-block-based storage
which focused on reducing storage costs. Appliance vendors like Data Domain
further improved on storage savings by using target-based- and
variable-block–based techniques that only required backing up changed data
segments rather than all segments, providing yet another layer of efficiency to
maximize storage savings.
As data deduplication efficiency improved, new challenges
arose. How do you backup more and more data across the network, without
impacting overall network performance? Avamar addressed this challenge with
variable block deduplication and source-based deduplication, compressing data
before it ever left the server, thereby reducing network traffic, the amount of
data stored on disk, and the time it took to backup. With this step forward,
deduplication became more than simply storage savings; it addressed overall
performance across networks, ensuring that even in environments with limited
bandwidth, data had a chance to be backed up in a reasonable time.
Another step function improvement to data deduplication was
achieved by Druva when it addressed data redundancies at object level (versus
file level) and solved for deduplication across distributed users at a global
scale.
Advances in data deduplication to manage massive volumes of
data
By the early 2000’s, business data was moving global,
real-time and mobile. IT team were challenged to backup and protect massive
volumes of corporate data across a range of endpoints and locations with
increased efficiency and scale. To address this challenge, Druva pioneered a
revolutionary concept of “app-aware” deduplication which analyzes data at the
file object level to identify file duplicates in attachments, emails, or even
down to the folder from which they originate. The approach added significant
gains in accuracy and performance for data backups, lowering the barrier for
companies to efficiently managing and protecting large volumes of data.
Data deduplication offers a new foundation for data
governance
Today, as cloud adoption reaches a tipping point and
companies have begun moving their data storage to a virtual cloud environment,
data deduplication plays a more strategic role than simply saving on storage
costs. In combination with cloud-based object storage architecture, efficient
data deduplication is opening up new opportunities to do more with stored data.
One example is data governance. With global deduplication
techniques, massive volumes of data can be backed up and stored in the cloud,
and made available to IT (and the C-Suite) to address compliance, data
regulation and real-time business insights. This is done by creating a
time-index file system which stores only the unique data required using meta data. The time indexed
view of data means that you now have historical context for information, and
data is always indexed and ready for forensics teams. This is a radical
departure from the traditional “backup to the graveyard” approach which is written
as a serial stream of incremental or full backups. Additionally, being able to
understand and analyze data in common among a set of users helps IT understand
data usage patterns and further optimize data redundancies across users in
distributed environments.
Just how clean is your data? Identify where your data requires attention, allowing you to choose which areas to improve.
Today advanced data deduplication is helping address two
competing forces that threaten to impede fast-growing enterprise businesses
today: managing the massive increase in corporate data created outside the
traditional firewall and solving for the growing need to govern data across its
lifecycle by timezone, user, devices and file types.Get Free Email Append Test from AverickMedia
Why Druva leads in its approach to data deduplication
Druva’s patented global data deduplication approach has four
unique attributes:
- It is performed on the client (versus the server), thereby reducing the amount of data needed to be shipped over the network.
- The analysis is done at the sub-file or block-level to find duplicate data within a file.
- It is aware of the applications from which data is generated. That is, Druva inSync look insides files such as an Outlook email file leveraging MAPI, to find duplicate data in email attachments.
- Druva’s deduplication scales beyond a single user to find duplicate data (say, an email sent to an entire organization) across multiple users and devices.
Article From: www.druva.com