Random Sampling

You can choose to refine any search to be a randomly sampled subset of your search results. When choosing a random sample, you are generating sample sets based on a specified percentage chance for any one document to be included in the sample. This means the sample percentage is the chance any individual document has to be included. Note that random sampling is not the exact percentage of the total document set that will be retrieved.

For example, if your search includes 100 documents, a 20% sample of those 100 documents will sometimes return 19 or 21 documents, rather than exactly 20 documents. A sampling determination is made for every document in the 100 document set, with a 20% chance that a document is included in the sample.

To set your random sampling threshold click on the More Options tab in the bottom right of any logical container, and choose a percentage of documents to sample. Then click Save. This sampling will apply to all documents captured by the conditions set in the logical container.


You can apply sampling to multiple logical containers.


For example, this search will sample 50% of emails sent from Ted Fick or Alice Bravo. From those emails that contain the word “California,” the search will sample 25%.

Document sampling operations will always be applied to a search container before grouping or removal operations. In other words, if you choose to sample your documents and also group them by email thread, your documents will be sampled before they are organized into email threads. This prevents partial email threads from appearing in your results table.


Some Use Cases for Sampling Results

  • Triaging Review: Let’s say you receive documents from a custodian. There are thousands of documents from this particular custodian, and you are not completely sure whether or not you should devote review resources to go through the documents. You can use the sample feature to take a randomly-sampled subset of the documents from the custodian set, say 10%. If a review of the sampled documents show that only 5% of the sample is relevant, you probably can direct resources elsewhere. If, instead, a review of the sample documents show that 60% of the sample is relevant, you might want to devote resources to reviewing documents from the custodian.
  • Training the predictive coding engine: Training the prediction engine with randomly sampled subsets of documents may help improve the precision and recall of the generated predictions. To learn more about predictive coding, click here.
Have more questions? Submit a request


Article is closed for comments.