Predictive Coding FAQs

To view all of Everlaw's predictive coding-related content, please see our predictive coding section. 

Table of Contents


Return to table of contents

How do I get started with predictive coding?

If you want to generate predictions for document rating (hot, warm, cold), you do not need to set up a new model, as it is included on Everlaw by default.  You simply need to review 400 unique documents, with at least 100 reviewed as hot, and 100 reviewed as cold.

For a comprehensive overview of the predictive coding process, simple workflows, best practices, and important definitions, please read the beginner’s guide to predictive coding.

Return to table of contents

What types of prediction models can I create?

You can create prediction models based on document ratings or codes. For example, if you are trying to find documents related to fraud, and are coding documents as Fraud or Not Fraud, you can build a prediction model to speed up your review process and help you identify documents that are likely to merit a Fraud code.

Return to table of contents

How can I create a new prediction model?

To create a new prediction model, select "create new model" at the bottom of the list of accessible models. A wizard will open, and will walk you through the creation of a new model. To read more about creating predictive coding models in Everlaw, see this article.

Return to table of contents

What kinds of questions are best answered by a predictive coding model?

Predictive coding models are best used to help identify which documents show responsiveness or relevance, so that these documents can be prioritized in the review process. Predictive coding is designed to help you gain insights that Everlaw search would not necessarily be able to provide.

Here is an example of a scenario in which using predictive coding would be appropriate:

  • You have a coding category called “Responsiveness” with two codes, “Responsive” and “Not responsive” and want help finding which documents are likely to merit the “Responsive” code.
  • You set up a model in which you specify that reviewed documents have a code from the “Responsiveness” category, and relevant documents have the code “Responsiveness: Responsive.”

This is an appropriate scenario because you want predictive coding to help point out which unreviewed documents are likely to be responsive, without you having to search through all of your documents individually. Without setting up a model, you cannot run a search through your documents to find unreviewed documents that are likely to be responsive.

Here is an example of a scenario in which predictive coding would not be appropriate:

  • You have datasets from several custodians. You suspect that emails whose custodian is John Smith are likely to be responsive.
  • You set up a model in which you specify that relevant documents have John Smith as their custodian.

This is not an optimal scenario for predictive coding, because it is easy to identify all documents whose custodian is John Smith by conducting a metadata search. The purpose of a predictive coding model is to train the system to recognize relevant documents, usually based on their contents. You do not need any help from machine learning in locating John Smith’s documents; the information is already freely available to you. Moreover, simply identifying the custodian of an email does not tell you whether the email is relevant/responsive. Some of John Smith’s emails might be irrelevant/unresponsive, depending on their contents. If you are searching for responsive emails and want to focus on John Smith’s emails, you should do the following:

  • Set up a model in which you specify that reviewed documents have a code from the “Responsiveness” category, and relevant documents have the code “Responsiveness: Responsive.” In the optional “excluded documents” step, specify that all documents whose custodian is not John Smith should be excluded from the model. This will allow the model to only learn from John Smith’s emails.

Return to table of contents

What determines the cutoff score for relevance?

You can view the cutoff score for relevance on your distribution graph. The purple flag in the graph is located at the cutoff score. All documents with a predicted relevance score above this point are considered relevant, and all documents with a score under this point are considered irrelevant. The cutoff point also coincides with the model’s maximum F1 score, which measures a model’s efficacy.

Return to table of contents

Is a document with a predicted relevance score of 85 more relevant than a document with a score of 100?

It’s important to understand predictive coding as a binary model. Documents are either predicted to be relevant, or predicted to be irrelevant, based on the threshold set by the F1 score. For example, if the F1 score says that the relevancy threshold is 60, that means that any document with a predicted relevance score over 60 is predicted to be relevant, and any document with a score under 60 is predicted to be irrelevant.

The higher a document’s score, the more likely it is to be relevant, and the higher confidence the model has in its relevance. However, the model does not decide “degrees” of relevance. If the relevancy threshold is 60, then a document with a score of 85 and a document with a score of 100 are both deemed likely to be relevant; the latter document is not likely to be “more relevant” than the former.

To read more about interpreting your model, see this article.

Return to table of contents

How do I know if my model is “good enough” or accurate?

There are several metrics on an active prediction model’s page that allow you to judge how well your model is doing. You can see your model’s precision, recall, and F1 score. Precision is a measure of the model’s accuracy; of the documents it predicts to be relevant, how many are actually relevant? Recall is a measure of the model’s scope; is it good at capturing all relevant documents, or does it let some relevant documents slip through the cracks? The closer precision and accuracy are to 100%, the better.

The F1 score is a combination metric that takes into account both precision and recall and provides an overall assessment of the effectiveness of the model. The closer F1 is to 1.00, the better.

To read more about interpreting your model, see this article.

Return to table of contents

How is the model being trained?

The model is trained by human review decisions. Reviewing many different types of documents helps the model learn what makes a document likely to be relevant or irrelevant. A well-covered model, or one in which a diverse set of documents has been reviewed, is generally better at making predictions.
You can improve your model’s training progress by creating training sets, which let you affirmatively identify sets of documents to manually review. For example, if you know that your team has not reviewed many emails from a certain custodian, you can create a training set with those custodian’s emails and assign that set to your reviewers.

Return to table of contents

What metadata fields are recognized and does Everlaw include "stop words"?  

Predictive coding uses the following metadata fields when considering content for predictions: author, subject, title, to, from cc, bcc. 

The following are considered "stop words", meaning that they will not be considered for any model:

a, an, and, are, as at, be, but, by, for if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.

 Return to table of contents

Do I need to do anything with the holdout set?

The holdout set is a set of documents (5% of the documents on the project) that meet your review criteria but are not used to train the model. It is the basis for your model’s performance metrics. The model does not use the holdout set’s characteristics to improve its predictions; rather, the holdout set is used as a standard against which the model’s effectiveness is judged. The holdout set only requires you to take direct action in the early stages of the model. In order to run your model and generate performance metrics, you need to manually review at least 400 unique documents within your review criteria, with 100 reviewed as relevant and 100 as irrelevant. Additionally, irrelevant documents cannot be near duplicates or emails in the same thread as a relevant document. These documents will be considered "conflicts" and will be removed from the consideration in your training set. If you have hit the review thresholds, but are not seeing an update to your model, it's likely that you need to review more unique documents in your training set or that you have too many "conflicts" that are irrelevant.

Once this is done, you don’t need to expressly review documents for your holdout set any longer, even though the set will continue to grow as your project expands and you review more documents.

Return to table of contents

Is the model always “on” or do I have to turn it on?

The model will initialize and begin running as soon as you have reviewed 400 unique documents in your project within your review criteria, with at least 100 being reviewed as relevant and 100 being reviewed as irrelevant. The model will update 24 to 48 hours, but you can also update it manually.

Return to table of contents

How long does the model take to run / How often does the model update?

The model will update every 24 to 48 hours, but you can also update it manually. Updates should take less than an hour, although this time can vary if other tasks are being run at the same time. 

If your model has a scheduled update and is not currently running, you can manually update your model by clicking Update Now. This will prioritize your model to be updated, pending the completion of currently updating models across the platform. Note that this will not result in an instant update.

Return to table of contents

Why is the model not updating? 

Generally, your model will begin making predictions once 400 unique documents have been reviewed (no duplicates) with at least 100 relevant documents and 100 irrelevant documents. If it seems that you have hit the above review threshold (400/100/100) but your model appears to be 'Idle' and has not updated even after 48 hours, it may be because there are not enough unique documents. Additionally, it is possible that you have too many “conflicts” that are irrelevant (i.e. emails that have been coded irrelevant in the same thread as emails coded relevant will not count towards the review threshold).

One way to approximate the number of reviewed documents that are being counted towards the review threshold (400/100/100) is to check for duplicates and children from email threads. Within your Predictive Coding model, click on the number of relevant or irrelevant documents in the model to open a search results table of the documents. Navigate to the Options gear icon.


Select 'Deduplicate within search hits' to hide project duplicates, select 'Email threads' from the Grouping column, then select 'Children' in the Remove from group column.


If the final document count is below 100 when searching within the reviewed relevant or irrelevant documents, you will need to review more documents to meet the review threshold. A simple way to find more documents to review is create a new training set from the Predictive Coding model's page. Learn more about Creating Training Sets.

Return to table of contents

It seems like there’s already a prediction model in my case, but I don’t remember setting one up. What’s going on?

A default model based on the “hot”, “warm”, “cold” rating system is included with all cases on Everlaw. For this model, documents rated “hot” will be considered relevant, documents rated “cold” will be considered irrelevant, and documents rated “warm” will be considered of intermediate relevance. Using this information, the system generates a predicted relevance value for all documents within the case.

Because the default rating model is created automatically at the start of each case, and requires no additional setup or maintenance,  even if you do not create additional models, you can still leverage rating predictions in your review.

Return to table of contents

How can I use predictive coding if I’m not an admin? / How can I see predictive coding models in my search results?

You cannot set up, manage, or view the predictive coding model if you’re not an admin.  However, you can use predictive coding results to speed up and improve your review.  

To search for predicted relevant documents, use the “predicted” search term in the search interface. Select your relevancy range or select a minimum or maximum relevancy cutoff score to find predicted documents. Remember that prediction ratings are given on a 0-100 scale, with values closer to 100 indicating a greater likelihood of being relevant to the model, and values closer to 0 indicating a greater likelihood of being irrelevant to the model.

Additionally, in the results table you can view model rating by selecting your model as a column header.  Click the +/- sign on the right side of the results table, and then select your model under the “prediction columns” section.


You can add multiple prediction models to the results table. They will each show up in separate columns with their respective prediction scores for each document.


Here are some use cases for which searching within prediction models is useful, using the rating model (in a model in which 50 is the relevance cutoff score): 

  • QAing review work: Documents predicted to be irrelevant but rated “hot”
  • Finding new documents to review: Documents predicted to be relevant but not yet viewed by anyone on the review team
  • Deposition prep: Documents from the custodian that is being deposed that are predicted to be relevant

Return to table of contents

Will predictive coding work with non-English documents?

The predictive coding feature is in general language-agnostic and will work on a corpus of non-English documents.

When using predictive coding on non-English documents, there will be minor differences in pre-processing such as the inability to identify common and "stop" words in other languages. Otherwise, just as in evaluating English documents, the system will use different words to contribute to the training and scoring system. There is no automatic translation of foreign languages during predictive coding.

Return to table of contents

Have more questions? Submit a request


Article is closed for comments.