Reprocessing Everlaw and Non-Everlaw Processed Documents

Reprocessing reconfigures previously uploaded documents to a new upload configuration. For example, you can change the time zone information, replace or remove images, or re-run documents that weren’t initially processed because of a missing or incorrect password.

Read this article to learn more about reprocessing and for step-by-step instructions on how to reprocess documents.

Note

Since reprocessing is a database-level permission, the document will be updated in all projects associated to the database.

Requirements

Required permissions: To reprocess documents, you must have Database Upload or Database Admin permissions. 

To include documents with highlights/redactions in the set of reprocess documents, you must be an Org Admin.

Background information 

Everlaw handles reprocessing for Everlaw processed documents differently than for non-Everlaw processed documents.

Everlaw processed documents are those that were uploaded natively and have already been through Everlaw's processing tools. The options to reprocess these documents mirror those available in the Details step of a native upload. Read the section Reprocessing Everlaw processed documents to learn more about reprocessing these documents.

Non-Everlaw processed documents include both documents that were uploaded as processed data and the produced version of documents that you produced on Everlaw. These documents have not been through Everlaw's processing tools. When you reprocess them, you make selections to configure them with options from Everlaw's processing tools. Read the section Reprocessing non-Everlaw processed documents to learn more about reprocessing these documents.

When you reprocess documents, you rerun them from their original state. This means any previous settings from the initial upload are overridden. For example, if your original native upload created images for your documents, but you select not to create PDFs when you reprocess, not only will no new images will be created upon reprocessing, the old images will be deleted .

Additionally, any documents that have images with highlights or redactions applied will not be reprocessed by default. This is to avoid changing the underlying PDF. If you are an Organization Administrator, then you can choose to override this behavior and reprocess documents with highlights or redactions. 

Reprocess Everlaw processed documents

To reprocess Everlaw-processed documents:

  1. Access a results table of the documents you want to reprocess. These should be natively uploaded documents with a control number (# prefix) identifier.
  2. [Optional] Select the subset documents you want to reprocess using the checkboxes on the left-hand side of the results table.
  3. Select Batch > Reprocess. This opens the Reprocess dialog.
    batch-reprocess.png
  4. Make sure the dialog is open to the Everlaw processed tab to reprocess documents that were processed on Everlaw (native files).
  5. Make your selection for Create PDFs for. If you don't want to make any changes to which documents have images, select Dataset default.
    You can read more about the options, and why you might choose each, in the Image details section of our Upload Native Native Data to Everlaw article.  

    Note

    Creating PDFs does not affect your billable size or impact your cost.

  6. [Optional] If you have passwords for password-protected files, add them in the Passwords for protected files field. The native viewer in the review window will not be available until you reprocess. After reprocessing with the password, you will be able to use the native viewer.
  7. [Optional] Expand Advanced settings for options to configure Timezone, OCR language, Page size, Hyperlinked imagesPowerPoint speaker notes, and/or Slack attachments. Please see our native uploading article for details about each of the Advanced settings.
    advanced reprocessing.png

    To leave any setting unchanged from the original configuration, choose Dataset default. This option refers to the original configuration settings of an upload. 

    Note

    To review the configuration settings for a given upload, look at the configuration on its upload card on the Data Transferarrows-data-transfer.png > Uploads page.

    Tip

    You can see which upload set a document is part of by adding the Document Set column to your results table view.

    Here are some additional details about the options in Advanced Settings:

    • If you want to remove the time zone, select No timezone in the Default timezone field.
    • If you want to reprocess audio files for transcription in Spanish, use the OCR language field to do so. Make sure that you select Spanish as the language. 
    • Slack attachments
      • For Slack data that had attachments fetched during the initial upload, selecting Do not fetch attachments stored in Slack referenced in chat files during reprocessing does not delete attachments that were previously fetched. The attachments remain in the database after reprocessing and need to be deleted separately. 
      • For Slack data that did not have its attachments fetched during the initial upload, reprocessing to fetch the attachments can bring new files into the database and increase billable size.
  8. [Database Admins and Organization admins only] [Optional] Select an option to Force OCR on a subset of documents.
    To force OCR on documents that has not been OCR'd, select Force OCR.
    To flatten documents and force OCR on them, select Flatten and Force OCR.
    Here are additional details to help you decide whether either of these options is appropriate for your dataset:
    • Upon upload, Everlaw automatically OCRs TIFFs, PDFs without embedded text, and PDF pages with fewer than 50 non-white space embedded characters. Text files for these documents are created upon upload. You do not need to force OCR to generate text for these documents.
    • Upon upload, Everlaw does not OCR JPGs, PNGs, or any file types other than those described above, and text files are not created for them. To override this behavior and create text files for these documents upon reprocessing, select Force OCR.
    • If a PDF has more than 50 characters of embedded text per page, this text is extracted upon upload. To add any text detected via OCR to the PDF's existing embedded text, select Force OCR.
      For example, if a PDF includes an image with un-embedded text within it, this additional un-embedded text is added to the document's text file after running Force OCR. Force OCR will not overwrite any text embedded in the PDF.
    • If you have a PDF with invalid embedded text, select Flatten and Force OCR for PDFs. During reprocessing, Everlaw ignores the existence of embedded text and replaces the PDF text with the OCR results.

      Important

      This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without choosing Flatten and force OCR for PDFs.

  9. [Organization Admins only] [Optional] To include documents that have highlights and/or redactions on them in the reprocessing task, select Include docs with highlights/redactions. 

    Important

    If there are changes to the images made during the reprocessing, existing highlights and redactions may "shift" in relation to their original positions.

  10. [Optional] To add a Decryption key, select Manage decryption keys. This takes you to the Advanced Settings tab of the Uploads page to add the decryption key.
  11. When you have configured the settings appropriately, select Reprocess.
  12. Confirm any warnings, then select Process. This starts the task to reprocess your documents. Your originally uploaded documents will be replaced by the reprocessed ones.

Note

Codes and annotations are not changed or removed during reprocessing.

Reprocess non-Everlaw processed documents

You can also reprocess documents not originally processed through Everlaw's processing tools. This includes both data uploaded through the processed uploader and documents created via a production on Everlaw.

This can be particularly helpful if you’ve received documents that do not have embedded text, or if they came without separate text files, as this can cause issues during review.

If separate text files are not included, then you cannot search the text of  documents without embedded text.

If text is not embedded within the image, hit highlighting and other text-based tools will not work in the image view of these files during review, regardless of whether or not a separate text file is included.

Below is a table with common non-searchable files. Reprocessing these documents generates searchable images or PDFs for documents that lack embedded text. 

Image only PDFs
  • These PDF files were created without a text layer. 
  • You can use the reprocessing tool to generate searchable images
TIFF Images
  • Standard TIFF files only contain images and not embedded text.
  • You can use the reprocessing tool to generate searchable images 
JPG or PNG Images
  • JPGs and PNGs only contain images and not embedded text.
  • Organization Admins and Database Admins can use Force OCR to generate searchable images  

To reprocess non-Everlaw processed documents, including to generate searchable images:

  1. Access a results table of the documents you want to reprocess. These should be uploaded documents with a Bates number identifier, rather than a control number (# prefix).
  2. [Optional] Select the subset documents you want to reprocess using the checkboxes on the left-hand side of the results table.
  3. Select Batch > Reprocess. This opens the Reprocess dialog.
    batch-reprocess.png
  4. Make sure the dialog is open to the Non-Everlaw processed tab to reprocess documents that were not originally processed on Everlaw.
  5. Select what documents to Generate:
    • To overwrite existing images/PDFs, select PDFs & Text.
    • To keep existing PDFs and only generate new text files, select Text.

      Note

      Reprocessing non-Everlaw processed documents may increase the billable size of the documents as a result of newly generated and stored image files.

  6. If you selected PDFs & Text in the step above, you also select what documents to Create PDFs for. You can read more about the options, and why you might choose each, in the Image details section of our Upload Native Native Data to Everlaw article.  
    If you select No documents, your existing images/PDFs will be removed and no new images will be created. Here are some more details:
    • When generating PDFs: If your documents do not have natives, Everlaw will convert the documents’ images into searchable PDFs. This will preserve the Bates stamps on the documents.
      If your documents do have natives, Everlaw will replace the existing images on the platform with PDFs that are generated from the documents’ natives.

      Important

      This will remove the Bates stamps, as well as any other features not present in the native versions of the documents, from the new PDFs.

      Reprocessing does not generate searchable images for JPGs or PNGs. To get searchable JPGs and PNGs, see step 8 for details about Force OCR.
    • When generating text: Everlaw will first attempt to generate text from a native file.
      If there is no native file for a document, Everlaw will check for embedded text in a PDF.
      Finally, if there is neither a native file nor embedded text, the document will be OCR’d. This will generate new text files for PDFs, TIFFs, JPGs, and PNGs.
  7. [Optional] Expand Advanced settings to configure settings for Metadata, Default timezone, OCR language, Page size, Hyperlinked images, and/or PowerPoint speaker notes. Here are details about each:
      • Metadata:
        • If you don't want the metadata of your documents changed, select Ignore.
        • If you select Merge, Everlaw will generate new metadata values from the native versions of your documents. If a document already has metadata values in certain fields, the original values will be retained. Any value that already exists on the document will also be retained.

          Note

          Selecting Merge has the potential to change your documents’ metadata from what the producing party provided.

      • Timezone: You can choose a timezone to be used for extracted date metadata without a timezone specified. If you leave the field empty, Everlaw will use UTC. You can also select No Timezone if you don't know the time zone.
        timezone.png
      • OCR Language and Page Size will follow the same rules as that of configuring native uploads. See the Advanced details section of the article on native uploads to learn more about these optios.
  8. [Database Admins and Organization admins only] [Optional] Select an option to Force OCR on a subset of documents.
    To force OCR on documents that has not been OCR'd, select Force OCR.
    To flatten documents and force OCR on them, select Flatten and Force OCR.
    Here are additional details to help you decide whether either of these options is appropriate for your dataset:
    • If you want to make new text  or replace existing text fordocuments without embedded text that are not OCR'd upon reprocessing, select Force OCR. The primary use-case for forcing OCR is to generate searchable images for processed, Bates-stamped JPGs and PNG so that hit highlights and searching will work in the image view.
      Force OCR can also add text from images embedded in PDFs that do have embedded text.
      You do not need to Force OCR to generate text for TIFFs or PDFs without embedded text, as the reprocessing steps outlined above will generate or replace the text for these document types. 
    • If you have a PDF with invalid embedded text, select Flatten and Force OCR for PDFs. During reprocessing, Everlaw ignores the existence of embedded text and replaces the PDF text with the OCR results.

      Important

      This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without choosing Flatten and force OCR for PDFs.

      •  
  9. [Organization Admins only] [Optional] To include documents that have highlights and/or redactions on them in the reprocessing task, select Include docs with highlights/redactions. 

    Important

    If there are changes to the images made during the reprocessing, existing highlights and redactions may "shift" in relation to their original positions.

  10. When you have configured the settings appropriately, click Reprocess.
  11. Confirm any warnings, then select Process. This starts the task to reprocess your documents. Your originally uploaded documents will be replaced by the reprocessed ones.

Monitor the progress of a reprocessing task

Once you the reprocessing task starts, a pop-up notification indicates the task has been queued. Another notification pops up when the task it is complete.

To view the progress of the processing job, go to Data Transfer arrows-data-transfer.png> Uploads. For non-Everlaw processed data, the processing job is represented by a card within the Processing Jobs tab on the lefthand side. For Everlaw-processed data, the progress is tracked on the original Native upload card(s) of the reprocessed documents.

processing_card.png

You can read more about how to upload documents, and how to re-configure your upload, here.

Return to table of contents