|
Openlaw Document Handling Software
from Oxford Law and Computing |
|
Navigation: Add-Ons > Equivio Text Compare |
![]() ![]()
|
Equivio (www.equivio.com) claim that "Equivio offers breakthrough technology to detect near-duplicate files. Near-duplicates represent 30-50% of email and document repositories. By grouping the near-duplicates, Equivio helps optimize the document review process."
"Equivio's patent-pending algorithm addresses the technological challenge of detecting
near-duplicate files. " Unlike other algorithms, which generate very different signatures for similar but different files, "Equivio's innovation and contribution is the ability to generate similar signatures for similar files."
They say:
"Equivio detects and groups near-duplicate files. This reduces the time and effort required to review a collection of documents:
| • | Stage 1: We start out with an unstructured collection of documents that we encounter at the outset of a discovery process. |
Typically, 30-50% of the documents are near-duplicates.
| • | Stage 2: Equivio identifies the near-duplicates and arranges them into sets. |
| • | Stage 3: The attorney (or paralegal) is presented with a set of near-duplicates and can deal with them together, in a coherent systematic manner. |
| • | Stage 4: Equivio identifies the pivot document, which is the most representative document of the near-duplicate set. The attorney can choose to review just the pivot document. In many cases, after reading the pivot document, the attorney will decide that the rest of the documents in the near-duplicate set can be skipped. |
| • | Stage 5: If, however, the set is interesting, the attorney can zoom in to review the remaining documents in the set. Using a compare utility, such as DeltaView, the attorney can simply review the differences of each document vis-à-vis the pivot document. This is a lot faster than reading each document from beginning to end. It's also a lot more effective because there is no chance of critical differences being missed. |
| • | Stage 6: Equivio ensures that near-duplicates can be treated consistently - for example, when coding documents as privileged, responsive and so on." |
Litigation support bureaux can process documents and generate the Equivio information about each document.
Similar documents are grouped into Equisets. Each Equiset has a 'Pivot' document and the degree of similarity of other documents in the set is measured from this pivot. Identical or 'duplicate' documents have 100% 'equivalence' or similarity. Similar documents have lower equivalences.
The table below shows typical output data from the process of generating Equiset information about documents:

This data shows details of two pivot documents, forming Equisets ES-004 and 005, with various data about each. In Openlaw , membership of an Equiset is treated as an Item Link, the pivot is the parent document and other documents in the Equiset are children. The other detailed information from the bureau (is added to the Item Link Notes as shown below. The following abbreviations may be used:
ES - Equiset
Similarity - Doc Similarity
Words - Word Count
ESS - Subset by Equivalence