To view all of Everlaw's predictive coding-related content, please see our predictive coding section.
Table of Contents
-
What kinds of questions are best answered by a predictive coding model?
-
What metadata fields are recognized and does Everlaw include "stop words"?
-
How long does the model take to run? / How often does the model update?
- Why has the model not kicked off and generated initial prediction scores?
- Will predictive coding work with non-English documents?
- I didn’t do any new review work. Why did my prediction scores change?
How do I get started with predictive coding?
In Everlaw, a predictive coding model will start to generate prediction scores after 200 qualified documents have been reviewed with at least 50 relevant and 50 irrelevant pursuant to the model’s criteria. To meet the training threshold, reviewed documents must have sufficient text, be unique (e.g., duplicates of a reviewed document that are coded the same are only counted once), and not in conflict (e.g., emails that have been coded irrelevant in the same thread as emails coded relevant are not considered as qualified reviewed). If you want to generate predictions for document rating (Hot, Warm, Cold), you do not need to set up a new model, as it is included on Everlaw by default. You simply need to review 200 qualified documents, with at least 50 reviewed as Hot or Warm, and 50 reviewed as Cold. Learn more about creating predictive coding models here and kicking off your predictive coding model here.
For a comprehensive overview of the predictive coding process, simple workflows, best practices, and important definitions, please read the beginner’s guide to predictive coding.
What types of predictive coding models can I create?
You can create predictive coding models based on document ratings or codes. For example, if you are trying to find documents related to fraud, and are coding documents as Fraud or Not Fraud, you can build a model to speed up your review process and help you identify documents that are likely to merit a Fraud code.
How can I create a new predictive coding model?
To create a new predictive coding model, navigate to the Predictive Coding page and select "+Create New Model" at the top of the list of accessible models. A wizard will open, and will walk you through the creation of a new model. Learn more about creating predictive coding models here.
What kinds of questions are best answered by a predictive coding model?
Predictive coding models are best used to help identify which documents show responsiveness or relevance, so that these documents can be prioritized in the review process. Predictive coding is designed to help you gain insights that Everlaw search would not necessarily be able to provide.
Here is an example of a scenario in which using predictive coding would be appropriate:
- You have a coding category called “Responsiveness” with two codes, “Responsive” and “Not responsive” and want help finding which documents are likely to merit the “Responsive” code.
- You set up a model in which you specify that reviewed documents have a code from the “Responsiveness” category, and relevant documents have the code “Responsiveness: Responsive.”
This is an appropriate scenario because you want predictive coding to help point out which unreviewed documents are likely to be responsive, without you having to search through all of your documents individually. Without setting up a model, you cannot run a search through your documents to find unreviewed documents that are likely to be responsive.
Here is an example of a scenario in which predictive coding would not be appropriate:
- You have datasets from several custodians. You suspect that emails whose custodian is John Smith are likely to be responsive.
- You set up a model in which you specify that relevant documents have John Smith as their custodian.
This is not an optimal scenario for predictive coding, because it is easy to identify all documents whose custodian is John Smith by conducting a metadata search. The purpose of a predictive coding model is to train the system to recognize relevant documents, usually based on their contents. You do not need any help from machine learning in locating John Smith’s documents; the information is already freely available to you. Moreover, simply identifying the custodian of an email does not tell you whether the email is relevant/responsive. Some of John Smith’s emails might be irrelevant/unresponsive, depending on their contents. If you are searching for responsive emails and want to focus on John Smith’s emails, you should do the following:
- Set up a model in which you specify that reviewed documents have a code from the “Responsiveness” category, and relevant documents have the code “Responsiveness: Responsive.” In the optional “excluded documents” step, specify that all documents whose custodian is not John Smith should be excluded from the model. This will allow the model to only learn from John Smith’s emails.
What determines the cutoff score for relevance?
Once your model generates performance metrics, you can view the cutoff score for relevance on your distribution graph. The purple flag in the graph is located at the cutoff score. All documents with a predicted relevance score above this point are considered relevant, and all documents with a score under this point are considered irrelevant. The cutoff point also coincides with the model’s maximum F1 score, which measures a model’s efficacy.
Is a document with a predicted relevance score of 85 more relevant than a document with a score of 100?
It’s important to understand predictive coding as a binary model. Documents are either predicted to be relevant, or predicted to be irrelevant, based on the threshold set by the max F1 score. For example, if the max F1 score says that the relevancy threshold is 60, that means that any document with a predicted relevance score over 60 is predicted to be relevant, and any document with a score under 60 is predicted to be irrelevant.
The higher a document’s score, the more likely it is to be relevant, and the higher confidence the model has in its relevance. However, the model does not decide “degrees” of relevance. If the relevancy threshold is 60, then a document with a score of 85 and a document with a score of 100 are both deemed likely to be relevant; the latter document is not likely to be “more relevant” than the former.
Learn more about interpreting your predictive coding model here.
How do I know if my model is “good enough” or accurate?
There are several metrics on an active prediction model’s page that allow you to judge how well your model is doing. You can see your model’s precision, recall, and F1 score. Precision is a measure of the model’s accuracy; of the documents it predicts to be relevant, how many are actually relevant? Recall is a measure of the model’s scope; is it good at capturing all relevant documents, or does it let some relevant documents slip through the cracks? The closer precision and accuracy are to 100%, the better.
The F1 score is a combination metric that takes into account both precision and recall and provides an overall assessment of the effectiveness of the model. The closer F1 is to 1.00, the better.
Learn more about interpreting your predictive coding model here.
How is the model being trained?
The model is trained by human review decisions. Reviewing many different types of documents helps the model learn what makes a document likely to be relevant or irrelevant. A well-covered model, or one in which a diverse set of documents has been reviewed, is generally better at making predictions.
You can improve your model’s training progress by creating training sets, which let you affirmatively identify sets of documents to manually review. For example, if you know that your team has not reviewed many emails from a certain custodian, you can create a training set with those custodian’s emails and assign that set to your reviewers.
What metadata fields are recognized and does Everlaw include "stop words"?
Predictive coding uses the following metadata fields when considering content for predictions: author, subject, title, to, from cc, bcc.
The following are considered "stop words", meaning that they will not be considered for any model:
a, an, and, are, as at, be, but, by, for if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with.
Do I need to do anything with the holdout set?
The holdout set is a randomly sampled 5% of documents from each upload to your project. The holdout set is used to generate model performance metrics after a sufficient number of holdout documents have been reviewed. The model does not use the holdout set’s characteristics to improve its predictions; rather, the holdout set is used as a standard against which the model’s effectiveness is judged.
Although basic performance metrics will be initially generated based on a random sample of documents from the “Reviewed” set, reviewing holdout set documents will allow you to generate performance metrics based on a consistent set of documents over time and accordingly improve the accuracy of these metrics.
In order to generate performance metrics based solely on reviewed holdout set documents, you need to manually review at least 200 qualified documents within your review criteria, with 50 reviewed as relevant and 50 as irrelevant. To meet the holdout threshold, reviewed holdout set documents must have sufficient text, be unique (e.g., duplicates of a reviewed document that are coded the same are only counted once), and not in conflict (e.g., emails that have been coded irrelevant in the same thread as emails coded relevant are not considered as qualified reviewed).
Once this is done, you don’t need to expressly review documents for your holdout set any longer, even though the set will continue to grow as your project expands and you review more documents.
Is the model always “on” or do I have to turn it on?
The model will initialize and begin generating predictions as soon as you have reviewed 200 qualified documents in your project within your review criteria, with at least 50 being reviewed as relevant and 50 being reviewed as irrelevant. The model will automatically update once every 24 to 48 hours.
If your model has a scheduled update and is not currently running, you can manually update all models in your project by clicking the "Update model" button at the bottom of the model page under the “Updates” section. This will prioritize your model to be updated, pending the completion of currently updating models across the platform. Note that this will not result in an instant update.
How long does the model take to run / How often does the model update?
The model will update every 24 to 48 hours, but you can also update it manually. Updates should take less than an hour, although this time can vary if other tasks are being run at the same time.
If your model has a scheduled update and is not currently running, you can manually update your model by clicking "Update model." This will prioritize your model to be updated, pending the completion of currently updating models across the platform. Note that this will not result in an instant update.
Why has the model not kicked off and generated initial prediction scores?
Generally, your model will begin making predictions once 200 qualified documents have been reviewed with at least 50 relevant documents and 50 irrelevant documents. If it seems that you have hit the above training threshold (200/50/50) but your model appears to be 'Idle' and has not updated even after 48 hours, it may be because you have not reviewed enough qualified documents.
Documents are considered qualified if they:
- have sufficient text,
- are unique (e.g., if there are duplicates of a reviewed document that are coded the same, only one of those documents is considered as qualified reviewed), and
- are not in conflict (e.g., emails that have been coded irrelevant in the same thread as emails coded relevant are not considered as qualified reviewed).
Predictive coding models that have not yet generated prediction scores will have a section titled “Prediction generation progress” at the top of the given model page indicating progress towards the training threshold. Specifically, the “Review progress” subsection displays progress bars indicating how many qualified documents have been reviewed and how many additional documents need to be reviewed to meet the training threshold. As you review documents according to the model’s criteria, the progress bars under this section will update and track your model’s progress towards the training threshold. To find additional qualified documents to review, simply create a new training set under the “Recommended to do next” section. Learn more about kicking off your predictive coding model here and creating training sets here.
It seems like there’s already a prediction model in my case, but I don’t remember setting one up. What’s going on?
A default model based on the “Hot”, “Warm”, and “Cold” rating system is included with all cases on Everlaw. For this model, documents rated “Hot” or "Warm" will be considered relevant and documents rated “Cold” will be considered irrelevant. Using this information, the system generates a predicted relevance value for all documents within the case.
Because the default rating model is created automatically at the start of each case, and requires no additional setup or maintenance, even if you do not create additional models, you can still leverage rating predictions in your review.
How can I use predictive coding if I’m not an admin? / How can I see predictive coding models in my search results?
You cannot set up, manage, or view the predictive coding model if you do not have Admin permissions on Prediction Models (project level setting). However, you can use predictive coding results to speed up and improve your review.
To search for predicted relevant documents, use the “predicted” search term in the search interface. Select your relevancy range or select a minimum or maximum relevancy cutoff score to find predicted documents. Remember that prediction ratings are given on a 0-100 scale, with values closer to 100 indicating a greater likelihood of being relevant to the model, and values closer to 0 indicating a greater likelihood of being irrelevant to the model.
Additionally, in the results table you can view model rating by selecting your model as a column header. From the Results Table toolbar, click View, select Add or Remove column, and then select your model under the “prediction columns” section.
You can add multiple prediction models to the results table. They will each show up in separate columns with their respective prediction scores for each document.
Here are some use cases for which searching within prediction models is useful, using the rating model (in a model in which 50 is the relevance cutoff score):
- QAing review work: Documents predicted to be irrelevant but rated “hot”
- Finding new documents to review: Documents predicted to be relevant but not yet viewed by anyone on the review team
- Deposition prep: Documents from the custodian that is being deposed that are predicted to be relevant
Will predictive coding work with non-English documents?
The predictive coding feature is in general language-agnostic and will work on a corpus of non-English documents.
When using predictive coding on non-English documents, there will be minor differences in pre-processing such as the inability to identify common and "stop" words in other languages. Otherwise, just as in evaluating English documents, the system will use different words to contribute to the training and scoring system. There is no automatic translation of foreign languages during predictive coding.
I didn’t do any new review work. Why did my prediction scores change?
Even if additional documents are not reviewed according to your model's criteria, uploading, deleting, or modifying documents can still cause your model to update. If a model updates in a project where an insufficient number of holdout set documents have been reviewed, the subset of reviewed documents being used to train the model changes. The change in documents used for training can lead to a change in prediction scores.
This happens because, until you have reviewed enough documents from the holdout set, each model update results in a new randomly sampled subset of your reviewed documents being pulled from the reviewed set to calculate performance statistics. In this case, reviewed documents used to calculate performance statistics are not used to train the model, thus changing the pool of reviewed documents eligible to train your model each time the model updates.
0 Comments