Predictive Coding FAQs – Knowledge Base

This article contains some common questions and answers about Predictive Coding in Everlaw. To view all articles on Predictive Coding, please visit our Predictive Coding section.

Frequently asked questions

Getting started

How do I get started with predictive coding?

In Everlaw, a predictive coding model will start to generate prediction scores after 200 qualified documents have been reviewed with at least 50 relevant and 50 irrelevant pursuant to the model’s criteria. To meet the training threshold, reviewed documents must have sufficient text, be unique (e.g., duplicates of a reviewed document that are coded the same are only counted once), and not in conflict (e.g. emails that have been coded irrelevant in the same thread as emails coded relevant are not considered as qualified reviewed). If you want to generate predictions for document rating (Hot, Warm, Cold), you do not need to set up a new model, as it is included on Everlaw by default. You need to review 200 qualified documents, with at least 50 reviewed as "Hot" or "Warm", and 50 reviewed as "Cold".

Learn more about creating predictive coding models here and kicking off your predictive coding model.

For a comprehensive overview of the predictive coding process, simple workflows, best practices, and important definitions, please read the beginner’s guide to predictive coding.

What types of predictive coding models can I create?

You can create predictive coding models based on document ratings or codes. For example, if you are trying to find documents related to fraud, and are coding documents as "Fraud" or "Not Fraud", you can build a model to speed up your review process and help you identify documents that are likely to merit a "Fraud" code.

What kinds of questions are best answered by a predictive coding model?

Predictive coding models are best used to help identify which documents show responsiveness or relevance, so that these documents can be prioritized in the review process. Predictive coding is designed to help you gain insights that Everlaw search would not necessarily be able to provide.

Here is an example of a scenario in which using predictive coding would be appropriate:

You have a coding category called “Responsiveness” with two codes, “Responsive” and “Not responsive” and want help finding which documents are likely to merit the “Responsive” code.
You set up a model in which you specify that reviewed documents have a code from the “Responsiveness” category, and relevant documents have the code “Responsiveness: Responsive.”

This is an appropriate scenario because you want predictive coding to help point out which unreviewed documents are likely to be responsive, without you having to search through all of your documents individually. Without setting up a model, you cannot run a search through your documents to find unreviewed documents that are likely to be responsive.

Here is an example of a scenario in which predictive coding would not be appropriate:

You have datasets from several custodians. You suspect that emails whose custodian is John Smith are likely to be responsive.
You set up a model in which you specify that relevant documents have John Smith as their custodian.

This is not an optimal scenario for predictive coding, because it is easy to identify all documents whose custodian is John Smith by conducting a metadata search. The purpose of a predictive coding model is to train the system to recognize relevant documents, usually based on their contents. You do not need any help from machine learning in locating John Smith’s documents; the information is already freely available to you. Moreover, simply identifying the custodian of an email does not tell you whether the email is relevant/responsive. Some of John Smith’s emails might be irrelevant/unresponsive, depending on their contents. If you are searching for responsive emails and want to focus on John Smith’s emails, you should set up a model in which you specify that reviewed documents have a code from the “Responsiveness” category, and relevant documents have the code “Responsiveness: Responsive.” In the optional excluded documents step, specify that all documents whose custodian is not John Smith should be excluded from the model. This allows the model to only learn from John Smith’s emails.

How can I use predictive coding if I’m not an Admin? / How can I see predictive coding models in my search results?

You cannot set up, manage, or view the predictive coding model if you do not have Admin permissions on Prediction Models (project level setting). However, you can use predictive coding results to speed up and improve your review.

To search for predicted relevant documents, use the Predicted search term in the search interface. Select your relevancy range or select a minimum or maximum relevancy cutoff score to find predicted documents. Remember that prediction ratings are given on a 0-100 scale, with values closer to 100 indicating a greater likelihood of being relevant to the model, and values closer to 0 indicating a greater likelihood of being irrelevant to the model.

Additionally, in the results table you can view model rating by selecting your model as a column header. From the Results Table toolbar, click View, select Add or Remove column, and then select your model under the “prediction columns” section.

You can add multiple prediction models to the results table. They will each show up in separate columns with their respective prediction scores for each document.

Here are some use cases for which searching within prediction models is useful, using the rating model (in a model in which 50 is the relevance cutoff score):

Quality checking review work: Documents predicted to be irrelevant but rated “hot”
Finding new documents to review: Documents predicted to be relevant but not yet viewed by anyone on the review team
Deposition prep: Documents from the custodian that is being deposed that are predicted to be relevant

Data privacy and security

Will my data leave Everlaw?

No. Predictive coding is performed entirely within the Everlaw platform.

Is any of my data used to train predictive coding models outside my organization?

No. Organization Admins can create a multi-matter model by leveraging previously trained predictive coding models within your organization to find relevant documents in similar matters. However, no training data or information about your models is used in predictive coding models beyond your organization. See our privacy policy for more details about how we protect your data.

Model creation and training

How can I create a new predictive coding model?

For instructions, visit our article about creating predictive coding models.

How is the model being trained?

The model is trained by human review decisions. Reviewing many different types of documents helps the model learn what makes a document likely to be relevant or irrelevant. A well-covered model, or one in which a diverse set of documents has been reviewed, is generally better at making predictions.
You can improve your model’s training progress by creating training sets, which let you affirmatively identify sets of documents to manually review. For example, if you know that your team has not reviewed many emails from a certain custodian, you can create a training set with those custodian’s emails and assign that set to your reviewers.

What metadata fields are recognized and does Everlaw include "stop words"?

Predictive coding uses the following metadata fields when considering content for predictions: author, subject, title, to, from cc, bcc.

The following are considered "stop words", meaning that they will not be considered for any model:

a, an, and, are, as at, be, but, by, for if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.

Do I need to do anything with the holdout set?

The holdout set is a randomly sampled 5% of documents from each upload to your project. The holdout set is used to generate model performance metrics after a sufficient number of holdout documents have been reviewed. The model does not use the holdout set’s characteristics to improve its predictions; rather, the holdout set is used as a standard against which the model’s effectiveness is judged.

Although basic performance metrics will be initially generated based on a random sample of documents from the “Reviewed” set, reviewing holdout set documents will allow you to generate performance metrics based on a consistent set of documents over time and accordingly improve the accuracy of these metrics.

In order to generate performance metrics based solely on reviewed holdout set documents, you need to manually review at least 200 qualified documents within your review criteria, with 50 reviewed as relevant and 50 as irrelevant. To meet the holdout threshold, reviewed holdout set documents must have sufficient text, be unique (e.g., duplicates of a reviewed document that are coded the same are only counted once), and not in conflict (e.g., emails that have been coded irrelevant in the same thread as emails coded relevant are not considered as qualified reviewed).

Once this is done, you don’t need to expressly review documents for your holdout set any longer, even though the set will continue to grow as your project expands and you review more documents.

Model behavior and performance

I'm coding attachment families together. Does this impact the performance of my predictive coding model?

Predictive coding models score each document independently of its relationship to other documents. When you code attachment families together, irrelevant documents end up getting the relevant coding criteria applied based on the relevance of a document within its attachment family (and not on its own characteristics). This can interfere with the performance of your model.

Consider an attachment to an email that, on its own, is not considered relevant. However, because it is attached to an email that is considered relevant, the document (attachment) gets coded with the relevant criteria. The predictive coding model will thus look at the characteristics of the attachment document and consider them relevant. This can lead to a "confused" model that isn't trained with a clear signal of what characteristics are truly relevant or irrelevant, and result in inaccurate prediction scores.

Best practices for reviewing attachment families when you are using a predictive coding model include either of the following:

Code each document individually during review, and then group your production criteria by family to make sure you bring in the complete families for any relevant documents you plan to produce. This lets the model get trained and generate scores during review based on only the characteristics of the relevant documents, while you can still produce responsive families.
Use a Predictive Coding-specific code as your criteria to train the model. In this schema, you code attachment families consistently as you typically do, but additionally create separate category and codes that you use for training your model, such a "PC Training: relevant" code for relevant documents, and a "PC Training: irrelevant" code for irrelevant documents. You apply these codes to each document independently, regardless of the relevance of their attachment family members. Your Predictive Coding model should be set up so that it is trained on the "PC Training: relevant" and "PC Training: irrelevant" criteria.

Is the model always “on” or do I have to turn it on?

Models initialize and begin generating predictions as soon as users have reviewed 200 qualified documents in the project within the model's review criteria, with at least 50 being reviewed as relevant and 50 reviewed as irrelevant. The model automatically updates once every 24 to 48 hours.

If your model has a scheduled update and is not currently running, you can manually update all models in your project by selecting the Update model button at the bottom of the model page under the Updates section. This prioritizes your model to be updated, pending the completion of currently updating models across the platform. It does not result in an instant update.

How long does the model take to run / How often does the model update?

The model updates every 24 to 48 hours, but you can also update it manually. Updates should take less than an hour, although this time can vary if other tasks are being run at the same time.

If your model has a scheduled update and is not currently running, you can manually update your model by selecting Update model. This prioritizes your model to be updated, pending the completion of currently updating models across the platform. It does not result in an instant update.

Why has the model not kicked off and generated initial prediction scores?

Generally, your model will begin making predictions once 200 qualified documents have been reviewed with at least 50 relevant documents and 50 irrelevant documents. If it seems that you have hit the above training threshold (200/50/50) but your model appears to be 'Idle' and has not updated even after 48 hours, it may be because you have not reviewed enough qualified documents.

Documents are considered qualified if they:

Have sufficient text
Are unique (e.g., if there are duplicates of a reviewed document that are coded the same, only one of those documents is considered as qualified reviewed)
Are not in conflict (e.g., emails that have been coded irrelevant in the same thread as emails coded relevant are not considered as qualified reviewed)

Predictive coding models that have not yet generated prediction scores will have a section titled “Prediction generation progress” at the top of the given model page indicating progress towards the training threshold. Specifically, the Review progress subsection displays progress bars indicating how many qualified documents have been reviewed and how many additional documents need to be reviewed to meet the training threshold. As you review documents according to the model’s criteria, the progress bars under this section will update and track your model’s progress towards the training threshold. To find additional qualified documents to review, simply create a new training set under the Recommended to do next section.

Learn more about kicking off your predictive coding model here and creating training sets.

How do I know if my model is “good enough” or accurate?

There are several metrics on an active prediction model’s page that allow you to judge how well your model is doing. You can see your model’s precision, recall, and F1 score:

Precision: A measure of the model’s accuracy (of the documents it predicts to be relevant, how many are actually relevant?).
Recall: A measure of the model’s scope (is it good at capturing all relevant documents, or does it let some relevant documents slip through the cracks?). The closer precision and accuracy are to 100%, the better.
The F1 score: A combination metric that takes into account both precision and recall and provides an overall assessment of the effectiveness of the model. The closer F1 is to 1.00, the better.

Learn more about interpreting your predictive coding model here.

What determines the cutoff score for relevance?

Once your model generates performance metrics, you can view the cutoff score for relevance on your distribution graph. The purple flag in the graph is located at the cutoff score. All documents with a predicted relevance score above this point are considered relevant, and all documents with a score under this point are considered irrelevant. The cutoff point also coincides with the model’s maximum F1 score, which measures a model’s efficacy.

Is a document with a predicted relevance score of 85 more relevant than a document with a score of 100?

It’s important to understand predictive coding as a binary model. Documents are either predicted to be relevant, or predicted to be irrelevant, based on the threshold set by the max F1 score. For example, if the max F1 score says that the relevancy threshold is 60, that means that any document with a predicted relevance score over 60 is predicted to be relevant, and any document with a score under 60 is predicted to be irrelevant.

The higher a document’s score, the more likely it is to be relevant, and the higher confidence the model has in its relevance. However, the model does not decide “degrees” of relevance. If the relevancy threshold is 60, then a document with a score of 85 and a document with a score of 100 are both deemed likely to be relevant; the latter document is not likely to be “more relevant” than the former.

Learn more about interpreting your predictive coding model here.

I didn’t do any new review work. Why did my prediction scores change?

Even if additional documents are not reviewed according to your model's criteria, uploading, deleting, or modifying documents can still cause your model to update. If a model updates in a project where an insufficient number of holdout set documents have been reviewed, the subset of reviewed documents being used to train the model changes. The change in documents used for training can lead to a change in prediction scores.

This happens because, until you have reviewed enough documents from the holdout set, each model update results in a new randomly sampled subset of your reviewed documents being pulled from the reviewed set to calculate performance statistics. In this case, reviewed documents used to calculate performance statistics are not used to train the model, thus changing the pool of reviewed documents eligible to train your model each time the model updates.

Miscellaneous

It seems like there’s already a prediction model in my case, but I don’t remember setting one up. What’s going on?

A default model based on the “Hot”, “Warm”, and “Cold” rating/code system is included with all cases on Everlaw. For this model, documents rated “Hot” or "Warm" are considered relevant and documents rated “Cold” are considered irrelevant. Using this information, the system generates a predicted relevance value for all documents within the case.

Because the default rating model is created automatically at the start of each case, and requires no additional setup or maintenance, even if you do not create additional models, you can still leverage rating predictions in your review.

Will predictive coding work with non-English documents?

The predictive coding feature is in general language-agnostic and works on a corpus of non-English documents.

When using predictive coding on non-English documents, there will be minor differences in pre-processing such as the inability to identify common and "stop" words in other languages. Otherwise, just as in evaluating English documents, the system will use different words to contribute to the training and scoring system. There is no automatic translation of foreign languages during predictive coding.