Table of Contents
Reprocessing Everlaw processed documents
If you have Database Admin or Upload permissions, you can retroactively re-configure documents that have already been uploaded. For example, you can change the time zone information, replace or remove images, or re-run documents that weren’t initially processed because of a missing or incorrect password, or a decryption key. Please note that since reprocessing is a database-level permission, the document will be updated in all projects associated to the database.
To reprocess documents from the results table:
- Select the documents you want to reprocess by using the checkboxes on the left-hand side of the results table.
- Select the Batch button located in the table's toolbar.
- Select Reprocess.
This opens the Reprocess dialog.
- Select the Everlaw processed tab to reprocess documents that were processed on Everlaw (native files).
- Reconfigure the following settings.
- Create PDFs for: Creating PDFs will not affect your billable size or impact your cost in any way
- Passwords for protected files: For password-protected files, the native viewer in the review window will be unavailable until you reprocess. After reprocessing with the password, you will be able to use the native viewer.
-
Advanced settings: Timezone, OCR language, Page size, Hyperlinked images, Decryption Keys. Please see our native uploading article for details about Advanced settings.
The dataset default option that is available for many of these settings refers to the original configuration settings of an upload. In other words, for each document that is being reprocessed with the dataset default option, Everlaw refers to the original upload settings to determine the value of a particular setting.
Note: You can review the configuration settings for a given upload via the upload card on the Data Transfer > Uploads page.Note: You can see which upload set a document is part of by adding the Document Set column to your results table view.
As you select these configuration settings, you are rerunning them from their original state. This means any previous settings from the initial upload will be overridden. This includes images, regardless of the setting you choose for PDFs. For example, selecting No documents means that no new images will be created upon reprocessing; however, the old images will be deleted to ensure that image and text files correspond.
Additionally, any documents that have images with highlights or redactions applied will not be reprocessed. This is to avoid changing the underlying PDF. If you are an Organization Administrator, then you can choose to override this behavior and reprocess documents with highlights or redactions.
In Advanced Settings, you can specify a different Time zone, Page Size, or OCR language. By default, OCR language detection is set to Autodetect. You can learn more about OCR language detection in the native uploads article.
Note: If you want to reprocess audio files for transcription in Spanish, you can use the OCR language field to do so. Make sure that you mark ‘Spanish’ as the language.
-
Organization admins have the option to Force OCR
- Upon upload, Everlaw automatically OCRs TIFFs, PDFs without embedded text, and PDF pages with fewer than 50 non-white space embedded characters. You do not need to force OCR to generate text for these documents, as they were OCR'd upon upload and will be OCR'd during reprocessing.
- Upon native upload and reprocessing, Everlaw does not OCR JPGs, PNGs, or any file types other than those described above. By default, these documents will not have a text file associated with them. Force OCR overrides this system behavior and OCRs these image types to generate a text file.
- If a PDF has more than 50 characters of embedded text per page, this text is extracted upon upload. "Force OCR" adds any text detected via OCR to the pdf's existing embedded text. For example, if a PDF includes an image with un-embedded text within it, this additional un-embedded text is added to the document's text file after running Force OCR. Force OCR will not overwrite any text embedded in the PDF.
- When you have configured the settings appropriately, click Process.
Everlaw will begin to reprocess documents. Your originally uploaded documents will be replaced by the reprocessed ones.
Note: work product, such as codes and annotations, will not be changed or removed during reprocessing.
Reprocessing non-Everlaw processed documents
You can also reprocess documents not originally processed on Everlaw. This can be particularly helpful if you’ve received documents that do not have embedded text, or if they came without separate text files, as this can cause issues during review.
- If separate text files are not included, then you cannot search the text of documents without embedded text.
- If text is not embedded within the image, hit highlighting and other text-based tools will not work in the image view of these files during review, regardless of whether or not a separate text file is included.
Below is a table with common non-searchable files. You can use the reprocessing tool to generate searchable images or PDFs for documents that lack embedded text.
Image only PDFs |
|
TIFF Images |
|
JPG or PNG Images |
|
To generate searchable images:
- Select the documents you want to reprocess by using the checkboxes on the left-hand side of the results table.
- Select the Batch button located in the table's toolbar.
- Select Reprocess.
This opens the Reprocess dialog. - Select the Non-Everlaw processed tab.
- Select your intended settings.
-
Generate:
- Here, you have the option to generate either PDFs & Text, or only Text. Generating PDFs will overwrite existing images/PDFs.
Note: Reprocessing non-Everlaw processed documents may increase the billable size of the documents as a result of newly generated and stored image files. - If you have chosen to generate PDFs of your reprocessed documents, you can also select whether or not to create PDF versions of documents that don’t image well, like spreadsheets. By selecting No documents, your existing images/PDFs will be removed and no new images will be created.
- If you'd like to keep existing PDFs and only generate new text files, select the Text option.
-
When generating PDFs: If your documents do not have natives, Everlaw will convert the documents’ images into searchable PDFs. This will preserve the Bates stamps on the documents.
If your documents do have natives, Everlaw will replace the existing images on the platform with PDFs that are generated from the documents’ natives. Note that this will remove the Bates stamps, as well as any other features not present in the native versions of the documents, from the new PDFs.
Note that reprocessing will not generate searchable images for JPGs or PNGs -
When generating text: Everlaw will first attempt to generate text from a native file.
If there is no native file for a document, Everlaw will check for embedded text in a PDF.
Finally, if there is neither a native file nor embedded text, the document will be OCR’d. This will generate new text files for PDFs, TIFFs, JPGs, and PNGs.
- Here, you have the option to generate either PDFs & Text, or only Text. Generating PDFs will overwrite existing images/PDFs.
-
Advanced settings:
-
Metadata: If you choose Ignore, the metadata of your documents will not be changed. If you select Merge, Everlaw will generate new metadata values from the native versions of your documents. If a document already has metadata values in certain fields, the original values will be retained. Any value that already exists on the document will also be retained.
Note that selecting Merge has the potential to change your documents’ metadata from what the producing party provided. -
Timezone: You can choose a timezone to be used for extracted date metadata without a timezone specified. If you leave the field empty, Everlaw will use UTC.
OCR Language and Page Size will follow the same rules as that of configuring native uploads. -
Force OCR: Organization Administrators have the option to Force OCR. Force OCR will make or replace the supplied text files for documents without embedded text that are not OCR'd upon reprocessing. The primary use-case for forcing OCR is to generate searchable images for processed, Bates-stamped JPGs and PNG so that hit highlights and searching will work in the image view. Force OCR can also add text from images embedded in PDFs that do have embedded text.
Org Admins do not need to Force OCR to generate text for TIFFs or PDFs without embedded text, as the reprocessing steps outlined above will generate or replace the text for these document types. If your organization does not have an Organization Administrator and you'd like to Force OCR on a set of documents, please reach out to support@everlaw.com.
-
Metadata: If you choose Ignore, the metadata of your documents will not be changed. If you select Merge, Everlaw will generate new metadata values from the native versions of your documents. If a document already has metadata values in certain fields, the original values will be retained. Any value that already exists on the document will also be retained.
-
Generate:
Monitoring the progress of a reprocessing job
Once you click Reprocess, you will see a temporary pop-up notification that the task has been queued, and another one once it is complete.
To view the progress of the processing job, go to Data Transfer > Uploads. For non-Everlaw processed data, the processing job is represented by a card within the Processing Jobs tab on the lefthand side. For Everlaw-processed data, the progress is tracked on the original Native upload card(s) of the reprocessed documents.
You can read more about how to upload documents, and how to re-configure your upload, here.