When collecting raw ESI from multiple individuals, there are
bound to be tremendous amounts of duplicative documents. In company-wide e-mail
chains, for example, a message is sent to multiple recipients and stored within
each individual’s mailbox. Depending on your organization’s data retention
policies, copies of the same file might also be found on the employee’s hard
drive, file server, or company backup tape.When collecting raw ESI from
multiple individuals, there are bound to be tremendous amounts of duplicative
documents. In company-wide e-mail chains, for example, a message is sent to
multiple recipients and stored within each individual’s mailbox. Depending on
your organization’s data retention policies, copies of the same file might also
be found on the employee’s hard drive, file server, or company backup tape.
For the attorney tasked with identifying, collecting and
reviewing ESI, an exhaustive review of a document set rife with duplicates
threatens the timeliness, cost effectiveness and efficiency of a project. The
risks intensify during review, where duplicate documents increase the potential
for inconsistent privilege and responsiveness decisions on identical documents.
To mitigate these concerns, many practitioners turn to
de-duplication technologies, where duplicate documents are identified and
managed during ediscovery processing to minimize redundant review. Effectively,
de-duplication can reduce the number of documents to be reviewed by as much as
90 percent, and, on average, 30 or 40 percent.
For the attorney tasked with identifying, collecting and
reviewing ESI, an exhaustive review of a document set rife with duplicates
threatens the timeliness, cost effectiveness and efficiency of a project. The
risks intensify during review, where duplicate documents increase the potential
for inconsistent privilege and responsiveness decisions on identical documents.
To mitigate these concerns, many practitioners turn to
de-duplication technologies, where duplicate documents are identified and
managed during e-discovery processing to minimize redundant review.
Effectively, de-duplication can reduce the number of documents to be reviewed
by as much as 90 percent, and, on average, 30 or 40 percent.
With de-duplication, an electronic “fingerprint” is created
for each document at the bit level, by leveraging a hashing algorithm. The
resultant fingerprints are measured against one another to determine which
documents are exact duplicates. Fingerprints change with nearly any type of
modification to the file —such as an extra space or formatting changes—and
stand out when measured against the existing document universe.
However, identifying duplicates is only the first step.
Simply removing all duplicate documents robs the reviewing attorney of potentially
important contextual information—such as who maintained or had access to an
important e-mail or document. Sophisticated e-discovery technologies have
evolved to allow several options for discovery teams to examine these
associated details.
With the Kroll Ontrack e-discovery processing engine, case
teams have several de-duplication options. When choosing a de-duplication
method, careful consideration of case needs should be measured in relation to
the following options:
- No de-duplication: All duplicate documents are provided for review and categorization, producing the largest number of documents for review. This method is strongly discouraged for cases involving voluminous amounts of data from backup tapes or collected over various occasions.
- Global or horizontal de-duplication: As each file is uploaded, it is compared to the entire data set for the e-discovery project. Only the first instance of each unique document is provided for review and categorization, resulting in the fewest number of documents for review. However, care should be taken when employing this method of de-duplication, as only one document will remain without any consideration of its relevance to the case over other duplicates.
- Per custodian or vertical de-duplication: Each file is uploaded and compared to a limited set of documents form the same document custodian, time period, or other data slice segment of documents. Only the first instance of each unique document per custodian or data slice will be provided for review. However, the same document may exist in other custodians or data slices and may then be provided for independent review. This type of de-duplication is particularly useful when processing multiple tapes for the same custodians over time or when discerned the context of the specific document in relation to the custodian.The deduplication options above are applied to documents as they are processed. Additionally, as documents are reviewed, they can be identified for relative similarity, called near duplicate identification, which ascertains similar documents that differ by simple formatting, document type or other semantic differences. These documents are often identified and grouped by one document—the “core” of the group. All related near-duplicate documents are compared to this core document. Near duplicate identification can help the reviewer better understand the relationship between the documents, allowing for mass actions on groups with similarities.Regardless of the method chosen, de-duplication can result in tremendous savings when properly leveraged to meet the needs of a project. However, it can also be wrought with complexity and pitfalls if improperly utilized. To avoid these risks and increase your efficiencies, contact your Kroll Ontrack Case Manager.
No comments:
Post a Comment