To view all of Everlaw's predictive coding-related content, please see our predictive coding section.
Table of Contents
For an introduction to the concepts behind predictive coding and machine learning, feel free to reference our beginner’s guide to predictive coding.
For a guide to predictive coding-related terms and commonly-asked questions about Everlaw’s predictive coding feature, see our Predictive Coding Terms and FAQs.
Initial steps
If you are a Project Administrator, you have access to predictive coding by default. Project Administrators can also give Prediction Model access to specific groups in the Permissions page.
To start building a predictive coding model, click on the Document Analytics icon on your top toolbar and select Predictive Coding.
Then, choose Predictive Coding from the left-hand menu, and select “Create New Model.”
The first page you’ll see provides you with an introduction to predictive coding, as well as a link to the Everlaw predictive coding beginner’s guide. Click Next to begin building your model.
Reviewed documents
First, specify which documents the model should learn from. These documents are considered “reviewed” for the purposes of the model. As an example, let’s say your team is reviewing documents for responsiveness, using the codes Responsive and Not Responsive under the coding category Responsiveness. Now, you want to build a predictive coding model that will find other responsive documents. To teach the model which types of documents are responsive and which are not, the model needs to be pointed towards documents your team has already reviewed for responsiveness. To do this, you would set your criteria for reviewed documents to be "Coded: Responsive OR Coded: Not Responsive" or “Coded under Responsiveness” (assuming the codes Responsive and Not Responsive are the only codes under the category Responsiveness). The model will therefore look at all documents that have been coded Responsive or Not Responsive to help it understand which types of documents are responsive.
In our example, it’s important that the "reviewed" criteria captures documents that are not responsive, as well as those which are responsive. In other words, we wouldn't want the "reviewed" criteria to only be "Coded: Responsive." This is because the model needs to learn what both responsive and non-responsive documents look like in order to make accurate predictions.
Relevant documents
Here, specify which types of documents you want the model to find. These documents are considered "relevant" to the model. For our responsiveness model, responsive documents are relevant to the model. In other words, we want the model to find responsive documents.
To specify relevant documents, build a query that captures only those documents that you want to find more of. In our responsiveness example, we would build a query that captures documents coded Responsive.
Now, the model knows what types of documents you want to see more of.
Excluding documents
This is an optional step that allows you to note if you’d like to exclude documents from your model. The model will not look at excluded documents at all, either for learning purposes or to make predictions. For example, you might want to exclude a certain type of file, like spreadsheets or audio files, from your model. If you do this, the model will not look at spreadsheets or audio files, even if they have been given a code in the Privilege category. Additionally, it will not make predictions for the likelihood of spreadsheets or audio files to be privileged or relevant. If you do not exclude any documents, all documents with adequate text (including transcribed audio and video files) will receive prediction scores.
Finalizing your model
On the next page, enter a name for your model. This name will be visible to everyone who uses the model, and will also be the name you use to search for the model’s predictions. By default, the name of your model will be the relevance criteria for your model, but you can rename it to whatever you like.
Finally, submit your model. Generally, your model will begin making predictions once at least 400 unique documents have been reviewed (no duplicates). At least 100 of those unique documents must be relevant and at least 100 must be irrelevant. Additionally, irrelevant documents cannot be near duplicates or emails in the same thread as a relevant document. These documents will be considered "conflicts" and will be removed from the consideration in your training set. If you have hit the review thresholds, but are not seeing an update to your model, it's likely that you need to review more unique documents in your training set or that you have too many "conflicts" that are irrelevant.
You can share your model before the model has generated any predictions. Click the share button in the top right of the model's page. You can also delete the model by clicking the trash can.
To read more about analyzing your model’s results, see the Predictive Coding Model Interpretation article.
0 Comments