Table of Contents
- How do I deduplicate, sample, group, or remove my search hits?
- Viewing your search and adjusting search settings in the results table
- Common use cases
How do I deduplicate, sample, group, or remove my search hits?
On Everlaw, you can deduplicate, sample, group, and remove documents within a search. All of these settings can be applied via the More Options tab.
The settings in More Options, and the combination of them, allow you to build specific searches with only a few simple steps, while also supporting very sophisticated search workflows (read the “Common complex use cases” section for some examples).
The More Options tab is in the bottom right corner of each logical container. Any setting can be applied to any logical container (inner or outer), with the exception of deduplication, which can only be applied to the outermost container. You can learn more about applying search settings on inner and outer search containers in this section of the article.
Once you’re in the More Options dialog, you can select any combination of settings. The effect of each setting on your results is reflected below each section as a positive or negative number. If there is no effect on your search, the section will say “No Change.” You can also click “Show walkthrough of your search settings” and click through each step to understand how your settings impact your search, which is particularly useful if you’re deduplicating it.
You can learn more about applying multiple search settings at once via some complex use cases at the end of this article.
Once you’re happy with your settings, click Save. All settings applied will be represented on the search container.
Deduplicate among search hits
Because Everlaw can identify documents as duplicates, it also allows you to manage duplicates as you go through review. This is called deduplication. Deduplication is the process of removing exact duplicates from the action you’re taking. For a conceptual overview of deduplication on Everlaw, visit this help article.
By deduplicating your search, only one copy of each document will be returned within your search results. If your search returns two or more documents that belong to the same duplicate group, search deduplication will return only one copy in your search, which will be the copy of the document in the search that was first uploaded to Everlaw.
For example, let’s say we want all documents in a binder, with duplicates removed. Add the binder search term to your query builder, click More Options, then select “Deduplicate within search hits.” A count of how many documents are removed from your search as a result of this setting will be shown. Click “Save,” and your binder is now deduplicated, indicated by the deduplicate tab on your search container.
Your project may have a search setting enabled that hides all project duplicates by default. In this case, the More Options tab will say “Hide All.”
You will also see a third option, selected by default, in the dialog that says “Hide all project duplicates.” This setting is intended for clients receiving productions with many duplicates who would like to mimic upload deduplication, while retaining all Bates numbered documents. It is rarely used in any other circumstances.
It’s understandably complex to think about how this option differs from deduplicating within search hits. Primarily, search deduplication is a search-wide setting. It occurs after identifying all documents that match your search criteria, then removes duplicative copies, leaving you with a single copy of each document that matches your search criteria. Hiding project duplicates is a project-wide setting. It occurs BEFORE evaluating your search criteria. With this option enabled, any document flagged as a “project duplicate” will be excluded from results, regardless of search criteria. In particular, that means you may not have even a single copy of a document that matches your search, if the only copies matching your search are considered project duplicates. This workflow is only recommended when you will be running a series of searches, and want to ensure that two different copies of the same document are never returned across the union of your searches.
In the sampling section of More Options, you can choose a randomly sampled subset of your search results for any given search. Sampling is helpful for triaging review, where you may receive thousands of documents of a particular custodian and want to review a sample to decide how to triage the entire set. It’s also a useful setting for training predictive coding models. Training a prediction model with randomly sampled subsets of documents may help improve the precision and recall of the generated predictions.
Document sampling will always be applied after deduplication, and before grouping or filtering decisions. In other words, if you choose to sample your documents and also group them by email thread, your documents will be sampled before they are grouped into threads. This prevents partial email threads from appearing in your results table. You can use the document counts below each setting, as well as the search walkthrough at the bottom of the dialog, to better understand how these settings are impacting your final results.
Sampling probabilities apply to each document, rather than the set as a whole. For example, if you apply 10% sampling, each document in your results will have a 10% chance of being returned, rather than 10% of documents being selected from the entire results set. This may affect the total number of documents returned when you sample from relatively smaller results sets. Another reason your total number of documents may differ from the expected % is if you have restricted document access.
Grouping allows you to organize your search hits by context: exact duplicates, attachments, email threads, or document versions. Each one is defined as the following:
- Attachments: Documents in an attachment family. Includes the parent document, often an email, and its attachments.
- Email Threads: Emails that comprise an email thread, including replies, reply all, and forwarded emails. Grouping by email thread will also include attachments and duplicate emails.
- Exact Duplicates: Duplicate copies of the document. A complete definition of duplicates is in this article.
- Versions: Versions of the same document (produced and pre-produced, translated and untranslated, etc.)
When including grouping into your search, you are pulling associated documents into the search, even if the documents may not meet your search criteria. For example, if you're searching for documents with the word "fraud” and you group by email thread, the search will include documents in the same email thread, even if they don’t have the word “fraud” in it.
Note that grouping will always be applied after deduplication and sampling, but before removal. This implies that deduplicated documents may get reintroduced if they are part of the context that you group by. You can use the document counts below each setting, as well as the search walkthrough at the bottom of the dialog, to better understand how these settings are impacting your final results.
Once you group your search hits, you have the option to remove certain contexts from that grouping: parents, children, search hits, grouped non-hits, and non-inclusive emails (complete projects only). Each one is defined as the following:
- Parents: The topmost member in a document grouping, such as the primary email to which other documents are attached.
- Child documents: Any document that is not the parent in a group, such as email attachments or project duplicates.
- Search hits: Any document that would be returned by your search, after search deduplication is applied.
- Grouped non-hits: All documents that are not designated as a search hits, but introduced via grouping.
- Non-inclusive emails (only available for complete projects): You can only select this removal option when grouping by email threads. Inclusive emails are the minimum set of emails that creates the most “complete” email content in the thread. It might be one email that is inclusive of all the thread's content, but it might be multiple emails that create the set. It is often the last email in the branch, and all previous emails should appear in the body of the document. Everlaw considers text,recipients, and attachments to determine inclusiveness.
Removal is always applied after the other settings in More Options. You can use the document counts below each setting, as well as the search walkthrough at the bottom of the dialog, to better understand how these settings are impacting your final results.
Note: You cannot remove parents when grouping by email thread. This is so that attachments are not displayed without their associated email parents in the results table, which you can learn more about in the next section of this article.
Viewing your search and adjusting search settings in the results table
Once you’ve run your search, you have the ability to view grouped documents and adjust search settings directly from the results table.
Grouped documents are collapsed by default and indicated by a caret next to the parent document. The total number of children in a document grouping is also displayed in parentheses (in the case of email threads, only other children and duplicate emails are included in the count, even though related attachments are grouped in this particular family).
On the left of the results table, you can click the expansion icon and expand all or collapse all document groups.
Child documents in a group will include decimaled row numbers of their parent. In this example, the parent is row #47 and its children are represented in the screenshot:
If any of your grouping settings result in removing parents from your results, either on the search page or results table, the parents will appear as greyed-out documents, with children visible under them. The parents will not be affected by any export, batch modify, or production actions.
To adjust your previously applied search settings, click Options in the results table toolbar. Clicking the icon will prompt the same dialog as the More Options tab in search, with previously applied settings selected. If you change your settings in any way, then the results table will update to include the selected settings. A new search will be saved as a separate card on the homepage.
Return to table of contents
Common use cases
Below are some common use cases for using search settings. In these examples, you will need to use settings for multiple containers and/or multiple settings within each container.
You can use search settings to perform QA on your responsive documents before producing them. In this example, you want to ensure that you aren’t producing any privileged documents. First, add an extra AND operator into your query builder. Within it, select the “Coded” term and choose your responsiveness code. Click More Options and group it by Attachments. Then click Save.
Next, click the outer AND container, and choose the “Coded” term again. Select your Privilege code, then negate it by clicking it once.
Your search identifies documents marked for production, including attachments, that have not been coded for privilege. This allows you to easily check for coding inconsistencies before running a production and assign them for review.
Identifying the top (parent) email and all of its attachments
Let’s say you’ve got an upload and you’d like to find the first email in a thread as well as its attachments. First, add an extra AND operator into your query builder. Within it, select the “Uploaded” search term and choose the upload you’d like to search for. Click More Options, then group by email thread. Next, remove the children. Then, click Save. Your result includes the parent documents, or the top email, for every email thread in our upload.
Next, we want to bring in the parent emails’ attachments. Click the outer container, then select More Options, then group by attachments. Then, click Save. We can interpret this search as the parent emails in threads in our upload, as well as the parent emails’ attachments.
A good way to double-check the logic of your search is via the instant search preview. The grey bar will display the order of operations conducted in your search.
Identifying all non-inclusive emails
Perhaps you want to get rid of non-inclusive emails in your database, so you’re interested in searching for all non-inclusive emails. Note that you must be in a complete project to run this search. This one is a bit tricky. We first need to search for all emails grouped by thread. Then, we’ll search for inclusive emails. Finally, we’ll combine those searches together and search for all emails grouped by thread, and not the inclusive ones!
First, start by searching for emails grouped by thread. Add the Type term, and select Email. Click More Options and group your search by email thread. Ensure that you’ve chosen to show all duplicates, then click Save. Click Begin Review on the search page. This will return all emails on your project, grouped by thread (including their attachments). It will also save your search to be used in the next step.
Create the second component of the search, which is "all emails that are inclusive." Since most settings in the previous search can be reused, you don’t need to create a brand new search. Click Options in the results table. Keep all settings the same, but in the Removal step, select Non-inclusive emails. Then click Save. This will return all inclusive emails because you removed the non-inclusive ones.
Our final step is to find all emails, grouped by thread, that are non-inclusive. Create a new search by clicking the magnifying glass in the navigation bar.
Add the Prior Search term and select the initial search we created: “Type Email, including duplicates, grouped by email threads.” Next, add Prior Search again, but select the second search (all inclusive emails). Finally, you should negate the second search by clicking the term to turn it red.
The way we can interpret this search, from top-down, is all emails grouped by thread, but NOT the inclusive ones.