June 3, 2022 Release: General release of Clustering, better visibility and reporting for native uploads, and more!

Expected US Release: June 3, 2022

Expected International Release: June 10, 2022

Expected Federal Release: June 17, 2022

This release brings Clustering to projects with up to 25 million documents! We’ve also improved visibility on document processing errors, added content and code searches to the Everlaw API,  and more — read on for more information about the features coming out this month! If you would like to learn more about the features in this release, join us for a live training session.

User-facing features in this release:

Everlaw Clustering

Clustering is now available on all projects and databases! As of this release, Clustering is supported on all projects up to 25 million documents. Clustering is an unsupervised machine learning algorithm that groups documents together by conceptual similarity. On Everlaw, Clustering is built with various functionalities that can be used throughout the lifecycle of a case, from ECA to quality assurance checks post-review. The visualization is designed to show the concepts and keywords of a document corpus at a high level without prior knowledge of the dataset, giving you both context and key terms worth investigating. 

clustering-1.gif

During document review, you can utilize the Clustering neighbors functionality in the context panel, which provides a list of conceptually similar documents to the one being viewed. In addition, Clustering’s overlay functionality reduces time spent in review through incorporating prediction scores from predictive coding models, letting you pinpoint precisely which documents and clusters should be prioritized for review. Overlays can also be used to QC document review work through showing you documents colored by rating or code, letting you quickly scan and spot documents that may have been coded or rated incorrectly based on how nearby conceptually similar documents are colored. Additionally, any cluster or document selection in Clustering can be seamlessly opened in Data Visualizer for further analysis from the right-hand panel. 

clustering-2.gif

This release also brings additional Clustering functionalities and improvements. Dynamic zoom, also known as auto depth, lets you dynamically see clusters with more specificity based on their zoom level. As you zoom in, clusters will automatically break up to show smaller clusters with more specific terms. As you zoom out, clusters will merge back together. If you would like to manually change the specificity and depth clusters, auto depth can be toggled off and depth levels can be adjusted using the slider bar present in the toolbar. Adjusting depth levels lets you dig deeper into specific clusters of interest, helping to pinpoint both important and irrelevant documents and terms quickly. The number of depth levels available are custom to the dataset, ranging from 1-5. In general, larger projects will have a higher number of possible depth levels compared to smaller projects. If you accessed Clustering in beta, your visualizations will need to be reclustered to utilize dynamic and manual depth enablement. 

clustering-3.gif

For projects larger than 25 million documents, we recommend clustering a smaller subset of the database in a partial project. Additional enhancements to Clustering include major speed improvements to reclustering and generating cluster visualizations, as well as improvements to cluster terms and labels through refactoring to reduce redundancies and generate more informative terms. If you accessed Clustering in beta, your visualizations will need to be reclustered to display newly refactored cluster terms. Newly generated cluster visualizations may take approximately one week to generate; if your model is still generating after this time, please contact support@everlaw.com.

Learn more about Clustering on Everlaw.

Processing error report and improved processing visibility

With this release, the current functionality of the “manage sources” dialog and report card for native uploads has been combined into a single tabbed dialog, along with a new tab that shows files that are currently being processed. Additionally, we’ve added a processing report for single uploads or multiple uploads.

The transferring tab has the same functionality as the “manage sources” dialog, allowing you to add additional files to an existing upload, cancel currently transferring files, or retry failed transfers of native uploads. You can also download the source files for native uploads directly from this tab.

The processing tab shows the files that are currently being processed in the native upload process. The tab also shows the number of files that have completed processing, the number of files queued for processing, and the number of files currently being processed by Everlaw. The table of currently processing documents displays the document control number, processing status, size and file name and path for documents that are currently being processed. 

This view is sorted in descending order of time since processing started on the files (files that take the longest to process are at the top), and provides additional information on additional processing steps on the files that are currently processing. If there are more than 100 files being processed at the same time, the table shows the details of the longest running 100 files.  The progress of reprocessing files can also be tracked in the same dialog when reprocessing files on Everlaw.

The possible processing steps and file types that have these steps are as follows:

Processing step

File types

Processing

All

OCR

PDFs, TIFF and other image files

Transcribing/Transcoding

Audio/video files, any file with audio

The new upload report tab has a redesigned upload report as well as a detailed CSV report for all files ingested into the platform. Users are able to download the CSV file for a single upload or filter native upload cards by name/date filters and download a report for a subset (or all) of upload cards. This is currently capped at 10,000,000 rows.

For a single upload, the option is available from the upload card in the three-dot menu.

For multiple uploads, the option is available from the Native Data Uploads page.

The download creates a task (similar to exports) where the user can download the generated CSV from the homepage/toast notification link.

The CSV file includes an entry for EVERY file in the upload, including those that are deduplicated or de-nisted on upload.

  • Document id (Control #)
  • Upload dataset name - This can be changed by renaming the upload card
  • Upload dataset id - This is fixed regardless of changes to upload dataset name
  • Upload date (UTC)
  • Filename
  • File path
  • Extension
  • Custodian
  • Processing/Production Flags - the current contents of the “Processing/Production Flags” Special Column, if any. (This is any of the flags available in the Uploaded document search term’s “flags” parameter.)
  • Processing error - true/false 
  • OCR - true/false
  • Reprocess date - if any; includes time
  • Reprocessed by - the user
  • Deduplicated - true/false
  • DeNISTed - true/false
  • Uploaded by - First and Last Name, or email/username for some legacy users

Some fields may have different behavior and may be blank for uploads prior to the release of this feature.

Field name(s)

Possible behaviors for blank fields in report

Reprocess date

Reprocessed by

Upload date

Uploaded by

Fields are blank for uploads created prior to June 3, 2022

Adding new sources/reprocessing files on prior uploads will populate those fields in the report csv for affected files

Document id for deduped/denisted docs

Fields are blank because document is not on platform

Custodian (If a custodian is edited on metadata, this doesn’t change the custodian on the processing report)

Fields are blank because no value was provided on upload

Processing/Production Flags 

Blank for no flags

Learn more about processed data errors.

Contents and codes in search API

With Everlaw’s API expanded support for robust and automated data reporting, you can now build automated searches using document contents and codes! In addition, your automated searches can now return billable size and number of pages in addition to document count. Lastly, your programs can now pull a list of available codes on a project. Additional search terms coming soon! 

Support for DWFx and RTFD native processing

Everlaw now supports the native processing of DWFx files, which are the newest version of the DWF file format used to represent design data and viewable in Microsoft XPS Viewer.

Processing will now generate a PDF format of the document for review as with other CAD files.

Everlaw has also added support for RTFD/NSFileWrapper serialized directories - directory contents will be extracted as with other container formats.

Learn more about Everlaw’s supported native data types.

End of Internet Explorer support

With this release, Everlaw will no longer be supported on Internet Explorer. Microsoft is retiring this browser in June 2022 and unsupported browsers may introduce unintended security and user experience problems. Everlaw is compatible with the latest versions of Chrome, Firefox , Edge, and Safari, as outlined in our Frequently Asked Questions article.