Predictive Coding Terms and Definitions – Knowledge Base

To view all of Everlaw's predictive coding-related content, please see our predictive coding section.

Predictive coding model types

Rating model: Predicts relevancy of documents based on “Hot, Warm, and Cold" relevancy ratings. The rating model is included in Everlaw by default; you do not need to create the model. To generate predictions for the default ratings model, you must rate at least 200 qualified documents, with at least 50 documents rated as Hot or Warm and 50 rated as Cold for the model to generate predictions.

Creating your own prediction model: You can create a predictive coding model based on any document or review attributes in Everlaw. To generate predictions for a custom model, you must code at least 200 qualified documents according to your model's criteria, with at least 50 documents coded relevant, and at least 50 deemed irrelevant (i.e., reviewed and not marked as relevant). Learn more about creating a predictive model here and kicking off your predictive coding model here.

Model setup

The database: The universe of all documents available for a particular model. These are all of the documents in your selected project.

Reviewed: The set of documents that are used for training a particular model. The criteria for "reviewed" is defined by the user in the creation step of custom models. For the default rating model, reviewed documents are documents that have been rated. To read more about reviewed documents, please see our model creation article.

Relevant: The subset of reviewed documents that the model should consider relevant. The criteria for this is also defined by the user. For the default rating model, relevant documents are documents that have been rated Hot or Warm. To read more about relevant documents, please see our model creation article.

Irrelevant: The subset of reviewed documents that the model should consider irrelevant. Reviewed documents that do not fit the relevant criteria are automatically determined to be irrelevant.

Holdout Set: 5% of the total documents in the case reserved to evaluate the performance of a model. The holdout set is maintained by taking 5% of the documents in each upload; you cannot affirmatively add documents to a holdout set. These documents are not used in training, they are used to calculate the model’s performance metrics. However, they might satisfy the "reviewed" criteria of one or more models. Because the documents in the holdout set always remain in the holdout set, these documents can be used to show historical performance and give constant feedback as your model improves. Learn more about the holdout set here.

Using predictions

To read more about using predictions, please refer to our model interpretation article.

Distribution graph: This graph is a function of the number of documents (y-axis) and relevancy (x-axis). You can use this graph to retrieve and review documents based on their current prediction scores. Documents are scored from 0-100, with 100 being very likely to be relevant given a model’s criteria for relevance. You can drag the adjustable green flag across the distribution graph to select document sets based on a prediction score cutoff.

Coverage graph: This graph is a function of coverage and relevancy (i.e., prediction scores). Coverage is how well the characteristics of your documents are understood. You can drag and drop to create a section of documents within the coverage graph that you want to review. A useful application of the coverage graph is to select documents with low coverage scores to improve the model.

Historical performance graph: This is a historical graph of your performance metrics over time (precision, recall, F1). The predictions over time are compared to the holdout set to generate historical performance.

Performance metrics

To read more about interpreting performance metrics, please refer to our model interpretation article.

Precision: This estimates what percentage of the relevant documents your model finds are actually relevant; For example, if the model predicts 100 documents to be relevant, and 90 of those are actually relevant, the model’s precision is 90%.

Recall: This estimates what percentage of the relevant documents in your project your model is finding. Using the same example as above, if there were 1000 more documents that were actually relevant and the model only captured 90 of them, the recall score would be low, despite the model’s precision being high.

F1: Because precision and recall measure different aspects of a model’s performance, an evaluation of a model’s overall performance requires taking both into account. The F1 score is a weighted average of the two. It represents the best balance between precision and recall given the current status of the model.

Prediction score: A numerical value between 0 - 100 that is assigned to a document by a predictive coding model to indicate the likelihood that a document is relevant based on prior reviews. On the scale of 0 - 100, 0 is considered less likely to be relevant while 100 is considered more likely to be relevant.

Ineligible documents: These documents do not have text identified or have too little text to generate predictions from. They do not have a prediction score and are not included in the model’s predictions. They will not be used for training the model. You can click the number of unprocessable documents to view them.

Training the model

Training set: Training sets help your model better understand the documents in your case, resulting in better predictions. You can create training sets of randomly selected documents, or you can create a training set from a recent search. Training sets will either be 300 documents, or 2% of the project's total document size, whichever is smaller. You can create more than one training set per model. Learn more about training sets here.

Weighted Terms: You can select and weight portions of text in documents that are particularly relevant in order to train your model more granularly. Learn more about weighting terms here.