To view all of Everlaw's predictive coding-related content, please see our predictive coding section.
Table of Contents
Predictive coding model types
Rating model - predicts relevancy of documents based on “hot, warm, and cold" relevancy ratings. The rating model is included in Everlaw by default, you do not need to create the model. You need to rate at least 400 unique documents, with at least 100 documents as hot and 100 as cold for the model to generate predictions.
Create your own prediction model - you can create a predictive model based on any document or review attributes in Everlaw. You need to code at least 400 unique documents, with at least 100 documents in the relevant category of your custom model, and at least 100 in the irrelevant (aka, reviewed and not marked as relevant). To read more about creating a predictive coding model, see our model creation article.
The database - the universe of all documents uploaded for a particular conceptual matter.
Reviewed - The set of documents that are used for training a particular model. The criteria for "reviewed" is defined by the user in the creation step of custom models. For the default rating model, reviewed = rated. To read more about reviewed documents, please see our model creation article.
Relevant - The subset of reviewed documents that the model should consider relevant. The criteria for this is also defined by the user. For the default rating model, relevant = hot. To read more about relevant documents, please see our model creation article.
Irrelevant - The subset of reviewed documents that the model should consider irrelevant. These documents are automatically determined: they are reviewed documents that do not fit the relevant criteria.
Holdout Set - 5% of the total documents in the case reserved to evaluate the performance of a model. The holdout set is maintained by taking 5% of the documents in each upload; you cannot affirmatively add documents to a holdout set. These documents are not used in training, they are used to calculate the model’s performance metrics. However, they might satisfy the ‘reviewed’ criteria of one or more models. Because the documents in the holdout set always remain in the holdout set, these documents can be used to show historical performance and give constant feedback as your model improves. To read more about the holdout set, please refer to the relevant section in our model interpretation article.
To read more about using predictions, please refer to our model interpretation article.
Relevancy graph - this chart is a function of number of documents (y-axis) and relevancy (x-axis). Use it to retrieve and review documents based on their current prediction scores. Documents are scored from 0-100, with 100 being very likely to be relevant given a model’s criteria for relevance. You can use the adjustable line to select document sets based on a prediction score cutoff.
Coverage chart - this chart is a function of coverage and relevancy. Coverage is how well the characteristics of your documents are understood. You can drag and drop to create a section of documents within the coverage chart that you want to review. A useful application of the coverage chart is to select documents with low coverage scores to improve the model.
Historical performance chart - this is a historical chart of your performance metrics over time (precision, recall, F1). The predictions over time are compared to the holdout set to generate historical performance.
To read more about interpreting performance metrics, please refer to our model interpretation article.
Precision - Of the documents predicted to be relevant, how many are actually relevant? For example, if the model predicts 100 documents to be relevant, and 90 of those are actually relevant, the model’s precision is 90%.
Recall - Of the documents that are actually relevant, how many were predicted to be relevant. Using the same example as above, if there were 1000 more documents that were actually relevant and the model only captured 90 of them, the recall score would be low, despite the model’s precision being high.
F1 - Because precision and recall measure different aspects of a model’s performance, an evaluation of a model’s overall performance requires taking both into account. The F1 score is a weighted average of the two. It represents the best balance between precision and recall given the current status of the model.
Ineligible docs - These documents do not have text identified or have too little text to generate predictions from. They do not have a prediction score and are not included in the model’s predictions. They will not be used for training the model. You can click the number of unprocessable docs to view them.
Training the model
Training set - Training sets help your model better understand the documents in your case, resulting in better predictions. You can create training sets of randomly selected documents from your database, or you can create a training set from a recent search. Training sets will either be 300 documents, or 2% of the project's total document size, whichever is smaller. You can create more than one training set per model. To read more about training sets, please refer to the relevant section in our model interpretation article.
Weighted Terms - You can select and weight portions of documents that are particularly relevant in order to train your model more granularly.