Reprocessing Everlaw and Non-Everlaw Processed Documents

Reprocessing runs previously uploaded documents through Everlaw's native processing tools, which allows you to update the configuration for your documents to do things like:

  • Change the displayed time zone information
  • Regenerate or remove images (PDFs)
  • Re-run documents that weren’t initially processed because of a missing or incorrect password
  • Force OCR for documents that weren't initially OCR'd

This can be useful for both natively uploaded documents (Everlaw processed) and processed uploads and productions (non-Everlaw processed documents).

In this article, we'll cover:

  • An overview of reprocessing, including how reprocessing works and situations when reprocessing is helpful
  • Details instructions for how to reprocess documents

Note

Since reprocessing is a database-level permission, a reprocessed document is updated in all projects in the database.

Requirements

  • To reprocess documents, you must be an Organization Admin or have Database Upload or Database Admin permissions
  • Only Database Admins can use the Force OCR options
  • To include documents with highlights/redactions in the set of reprocessed documents, you must be an Organization Admin

Background information 

During initial native upload, Everlaw's processing tools get documents ready to review. This includes extracting metadata and document text to make it searchable, generating images (PDFs) to highlight and redact, and setting some document-type specific configurations, such as how powerpoint speaker notes are formatted. 

During processed uploads with a load file, documents do not go through Everlaw's processing tools. 

When you reprocess, you run documents through Everlaw's native processing tools. For natively uploaded documents, this is running them through for the second time. For documents uploaded as processed data, it's for the first time.

Reprocessing reruns documents from their original state and previous settings are overridden. For example, if your original native upload created images for your documents, but you select not to create PDFs when you reprocess, not only will no new images be created upon reprocessing, the old images will be deleted.

Note

Any documents that have images with highlights or redactions applied will not be reprocessed by default. This is to avoid shifting the relative position of content in the underlying PDF. Organization Administrators can override this behavior and reprocess documents with highlights or redactions.

Note

Codes and annotations are not changed or removed during reprocessing.

Billable size

Reprocessing, especially when flattening images or creating/removing images, can impact the billable size of your data

When to reprocess

There are several common reasons to reprocess documents, depending on the upload type. We'll cover some of them in the following sections.

Native uploads

For natively uploaded documents, common reasons to reprocess are:

  • Add passwords: Some files in your native upload were password-protected, but you did not have the passwords handy during initial upload. Once you get the passwords, you can enter them during reprocessing, which makes the documents available for review.
  • Update time zone: The time zone displayed in the PDF view for emails, for values such as Date Sent or Date Received, is set during upload. If the correct time zone wasn't initially selected, you can reprocess the emails to update the time zone displayed in the email header. 
  • Capture additional text: Some documents may have text that wasn't captured during initial processing. This could be handwritten text, or unembedded text within document types that do not get OCR'd by default upon upload. Database Admins can Force OCR during reprocessing to capture this text and make it available in search. There are more details about how to reprocess for text in the Detailed description of OCR options section. 
  • Generate Spanish transcripts for audio and video files: If your OCR language was set to English upon initial upload, audio and video files in Spanish will not have accurate transcripts. You can reprocess and select Spanish as the OCR language to generate an accurate transcript. 

    Note

    Audio and video files cannot be transcribed in any language except Spanish or English.

More generally, it can make sense to reprocess natively uploaded documents any time that the initial upload configuration had mistakes or if your uploaded documents didn't process the way you expected. Reprocessing gives you the chance to reconfigure the upload, mostly from scratch, to get the settings right. To see what reprocessing cannot fix, see the Limitations: What reprocessing cannot do section below.

Processed uploads

For processed data, the most common reason to reprocess is because the images provided in a production were TIFF or other image file types that do not have an embedded text layer, and/or because they did not include separate text files. 

If separate text files are not included, then you cannot search the text of  documents without embedded text.

If text is not embedded within the image, hit highlighting and other text-based tools will not work in the image view of these files during review, regardless of whether or not a separate text file is included.

You can reprocess these files to:

  • Generate a new text file 
  • Convert the images to PDFs
  • Embed a text layer in the images

Below is a table with common non-searchable files. Reprocessing these documents to generate new images will generate searchable images or PDFs for documents that lack embedded text. 

Image only PDFs
  • These PDF files were created without a text layer
  • You can reprocess to generate searchable images
TIFF Images
  • Standard TIFF files only contain images and not embedded text
  • You can reprocess to generate searchable images 
JPG or PNG Images
  • JPGs and PNGs only contain images and not embedded text
  • You can reprocess to generate searchable images  

Important

If your non-Everlaw documents include native files, reprocessing defaults to using the native file to regenerate the image. This can result in Bates stamps on the original images being removed. To keep Bates stamps, select Ignore natives in the Advanced settings of the reprocessing dialog. This will reprocess from the image file even when a native is present, which will retain the Bates stamp.

Limitations: What reprocessing cannot do

There are some limits to what reprocessing can do:

  • If a file is corrupt or opens with errors in native applications (e.g., Word, PowerPoint, a PDF viewer), you may need to repair it offline and overlay a corrected version
  • If you did not initially DeNIST, selecting to DeNIST does not remove the uploaded NIST files from Everlaw
  • If you originally selected to fetch Slack attachments, selecting Do not fetch Slack attachments during reprocessing does not remove these files. You will still need to delete them separately.
  • If the load file for your processed upload is missing metadata, and your upload does not include native files, you cannot reprocess to get more metadata

Detailed description of OCR options

If you are planning to reprocess to capture additional text via OCR, it's important to understand your options for OCRing documents and to make the right document selections. This section goes over the details of how Everlaw handles OCR so you can decide how to organize your files for reprocessing.

Upon native upload and reprocessing, Everlaw automatically OCRs:

  • All image files (TIFFs, JPGs, PNGs, etc.)
  • Inlined images within emails
  • PDFs without embedded text
  • PDF pages with fewer than 50 characters of embedded text.

You do not need to force OCR to generate text for these documents during reprocessing, whether they were uploaded natively or as processed data.

Some documents contain text that is not captured with the standard OCR behavior. For these scenarios, there are options to force OCR. Forcing OCR during reprocessing overrides the standard behavior and can capture text in documents that are not OCRed by default. 

The following examples, though not exhaustive, are representative of when forcing OCR is useful.

For natively uploaded documents:

  • Word processing documents with text-containing images: Word processing (e.g. DOCX) documents are not OCRed upon upload. If your upload included word processing documents with inlined images that contain text, selecting Force OCR > Default during reprocessing can capture the text in those images and add it to any text that was already extracted from the document.
  • PDFs with text-containing images: PDF pages with more than 50 characters of embedded text are not OCRed upon upload. If a PDF page with more than 50 characters of embedded text includes an image with un-embedded text within it, forcing OCR with the default option during reprocessing can add this additional un-embedded text to the document's text file. Forcing OCR with the default option will not overwrite any text embedded in the PDF.
  • Invalid text for a PDF: If a file's text view doesn't match the words in the PDF, this can mean that the PDF has invalid embedded text. Invalid text makes it impossible to search for the text that is actually in the document. When you select Force OCR > Flatten PDFS, Everlaw removes the embedded text layer from the PDF, then OCRs the PDF and replaces the embedded text with the OCR results. This removes the invalid text.

    Important

    This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without selecting Force OCR.

  • Handwriting: Everlaw's default OCR tool is not optimized to capture handwriting on documents. For documents that have English handwriting on them (either solo or alongside typed text), you can reprocess using an OCR tool that is optimized for handwriting. There are some specifics to be aware of when you use our OCR optimized for English handwriting option:
    •  This option is optimized for handwriting only in English and not in any other languages. You should not use this option for documents with non-English handwriting. 
    • For typed text, this option only works for English, German, French, Spanish, Italian, or Portuguese.
    • The limit for this tool is 100 documents per action. If you have more than 100 documents selected in the results table, this option is disabled.
    • This option works for PDF, PNG, JPEG, and TIFF formats
    • Documents reprocessed with this option selected will be searchable using a Handwriting metadata field. Those with handwriting detected will have a Yes value, and those without handwriting detected will have a No value. Documents you don't reprocess with this option will not have a value. You can search for them with the (no value) option in the Handwriting search term. Learn more about searching metadata in our Searching Metadata article.

It is rare to need to force OCR for data uploaded as processed, since image files (such as TIFFS, flat PDFs, JPEGS, and PNGs) are OCRed automatically during reprocessing. 

Best practices for defining reprocessing sets

All documents selected for reprocessing go through the same configuration. This means that it's important to make sure selected documents have the same reprocessing needs. Depending on why you're reprocessing, you can identify your document sets in a few different ways. Here are a few suggestions:

  • Build a search using processing flags to identify subsets of documents that didn't process correctly or that might benefit from reprocessing. For example, processing flags can help you identify password-protected documents or documents with transcription that you might want to translate to Spanish.
  • Use binders or codes for reviewers to identify documents that have issues. For example, documents with garbled/invalid text files that might be otherwise hard to search for can be added to a "Needs Reprocessing"  binder to reprocess.
  • When reviewing processed data, the Empty text label or Hit Highlights is not supported in the current document view for these files indicates the absence of an embedded text layer. Reprocessing can generate text and, if desired, searchable PDFs.

Reprocess Everlaw processed documents

Everlaw-processed documents are those that were uploaded natively. To reprocess Everlaw-processed documents:

  1. Access a results table of the documents you want to reprocess. These should be natively uploaded documents with a control number (# prefix) identifier.
  2. [Optional] Select the subset documents you want to reprocess using the checkboxes on the left-hand side of the results table. 
  3. Select Batch > Reprocess. This opens the Reprocess dialog.
  4. Make sure the dialog is open to the Everlaw processed tab to reprocess documents that were processed on Everlaw (native files).
  5. Make your selection for Create PDFs for. This determines whether or not your generate image files for your documents:
    •  If you don't want to make any changes to which documents have images, select Dataset default
    • To create images for all your documents, select All documents
    • If you don't want to have images for your files, select No documents. After reprocessing, none of the documents will have a PDF view, even if they had one prior to reprocessing.
      You can read more about the options, and why you might choose each, in the Image details section of our Upload Native Native Data to Everlaw article.  
  6. [Optional] If you have passwords for password-protected files, add them in the Passwords for protected files field. The native viewer in the review window will not be available until you reprocess. After reprocessing with the password, you will be able to use the native viewer.
  7. [Optional] Expand Advanced settings for options to configure Timezone, OCR language, Page size, Hyperlinked images, PowerPoint speaker notes, Slack attachments, DeNIST, and/or OCR Options. Our native uploading article has additional details about Advanced settings.
    advanced reprocessing.png
    To leave any setting unchanged from the original configuration, choose Dataset default. This option refers to the original configuration settings of an upload. 

    Tip

    To review the configuration settings for a given upload, look at the configuration on its upload card on the Data Transferarrows-data-transfer.png > Uploads page. You can see which upload set a document is part of by adding the Document Set column to your results table view.

    The following details can help you decide whether to adjust any of the settings from their dataset default:

    • Default timezone: For emails, this affects the time zone displayed in the header of the PDF view. If you want to remove the time zone, select No timezone in the Default timezone field.
    • If your documents are in a language other than the one selected during initial upload, select the language in the OCR Language field. This will OCR eligible documents in the selected language.  
      To reprocess audio files for transcription in Spanish, select Spanish as the language. 

      Note

      OCR will only be run on the document types described above.  

    • Slack attachments:
      • For Slack data that had attachments fetched during the initial upload, selecting Do not fetch attachments stored in Slack referenced in chat files during reprocessing does not delete attachments that were previously fetched. The attachments remain in the database after reprocessing and need to be deleted separately. 
      • For Slack data that did not have its attachments fetched during the initial upload, reprocessing to fetch the attachments can bring new files into the database and increase billable size
    • DeNIST: Selecting  DeNIST does not delete NIST files that were previously uploaded
  8. [Database Admins ] [Optional] Select an option to Force OCR on a subset of documents. The Detailed description of OCR options section includes more details about the Force OCR options, but here are some important details:
    Upon upload and reprocessing, Everlaw automatically OCRs the following document types. You do not need to force OCR to generate text for these documents:
    -  All image files (TIFFs, JPGs, PNGs, etc.)
    - Inlined images within emails
    - PDFs without embedded text
    - PDF pages with fewer than 50 characters of embedded text.
    Everlaw does not OCR any file types other than those described above.
    To override this behavior, select Force OCR, then select a specific option:
    • Default: This OCRs all documents that are not OCRed by default upon reprocessing. Select this option to append detected un-embedded text to documents that are not OCRe with the standard reprocessing options. 
    • Flatten PDFs: Everlaw removes the embedded text layer from the PDF, then OCRs the PDF and replaces the embedded text with the OCR results. This is only recommended for PDFs that have invalid embedded text.

      Important

      This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without choosing Flatten PDFs.

    • OCR optimized for English handwriting: Use an OCR tool that captures handwriting in English. There are some important details to be aware of with this option:
      • This option is optimized for handwriting only in English and not in any other languages. You should not use this option for documents with non-English handwriting.
      • For typed text, this option only works for English, German, French, Spanish, Italian, or Portuguese.

        Important

        Using this option for documents in a language other than the ones listed here can result in garbled, inaccurate text or no selectable text for highlights or redaction using the text selector tool.

      • The limit for this tool is 100 documents per action. If you have more than 100 documents selected in the results table, this option is disabled.
      • This option works for PDF, PNG, JPEG, and TIFF formats.
      • Documents reprocessed with this option selected will be searchable using a Handwriting metadata field. Those with handwriting detected will have a Yes value, and those without handwriting detected will have a No value. Learn more about searching metadata in our Searching Metadata article.
    • Flatten PDFs and OCR optimized for English handwriting: This option removes the embedded text layer of the PDF, then does OCR using a tool that is optimized for handwriting in English. It is only recommended for PDFs with invalid embedded text. The bullet points for the above OCR optimized for English handwriting option also apply, other than the one about file types: this option only works for PDFs.

      Important

      This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without choosing Flatten PDFs.

  9. [Organization Admins only] [Optional] To include documents that have highlights and/or redactions on them in the reprocessing task, select Include docs with highlights/redactions. 

    Important

    If there are changes to the images made during the reprocessing, existing highlights and redactions may "shift" in relation to their original positions.

  10. [Optional] To add a Decryption key, select Manage decryption keys. This takes you to the Advanced Settings tab of the Uploads page to add the decryption key.
  11. When you have configured the settings appropriately, select Reprocess. 
  12. Confirm any warnings, then select Process. This starts the task to reprocess your documents. Your originally uploaded documents will be replaced by the reprocessed ones.

Note

Codes and annotations are not changed or removed during reprocessing.

Reprocess non-Everlaw processed documents

Non-Everlaw processed documents are those that were uploaded as processed data or those created in a production. To reprocess non-Everlaw processed documents, including to generate searchable images:

  1. Access a results table of the documents you want to reprocess. These should be uploaded documents with a Bates number identifier, rather than a control number (# prefix).
  2. [Optional] Select the subset documents you want to reprocess using the checkboxes on the left-hand side of the results table.
  3. Select Batch > Reprocess. This opens the Reprocess dialog.
  4. Make sure the dialog is open to the Non-Everlaw processed tab to reprocess documents that were not originally processed on Everlaw.
  5. Select what documents to Generate:
    • To overwrite existing images/PDFs, select PDFs & Text.
    • To keep existing PDFs and only generate new text files, select Text.

      Note

      Reprocessing non-Everlaw processed documents may increase the billable size of the documents as a result of newly generated and stored image files.

  6. If you selected PDFs & Text in the step above, you also select what documents to Create PDFs for. You can read more about the options, and why you might choose each, in the Image details section of our Upload Native Native Data to Everlaw article.  
    If you select No documents, your existing images/PDFs will be removed and no new images will be created. Here are some more details:
    • When generating PDFs:
      •  If your documents do not have natives: Everlaw will convert the documents’ images into searchable PDFs. This will preserve the Bates stamps on the documents.
      • If your documents do have natives: The default behavior is for Everlaw to replace the existing images on the platform with PDFs that are generated from the documents’ natives. To ignore the natives and reprocess only from the existing images, see the details about the Ignore natives option in step 10 below.
      • Important

        Reprocessing from the native will remove the Bates stamps, as well as any other features not present in the native versions of the documents, from the new PDFs.

    • When generating text: Everlaw will first attempt to generate text from a native file.
      If the Ignore natives option described in step 10, below, is selected or If there is no native file for a document, Everlaw will check for embedded text in a PDF.
      Finally, if there is neither a native file nor embedded text, the document will be OCR’d. This will generate new text files for PDFs, TIFFs, JPGs, and PNGs.
  7. [Optional] Expand Advanced settings to configure settings for Metadata, Default timezone, OCR language, Page size, Hyperlinked images, PowerPoint speaker notes, and/or OCR options. Here are details about each:
    • Metadata:
      • If you don't want the metadata of your documents changed, select Ignore.
      • If you select Merge, Everlaw will generate new metadata values from the native versions of your documents, if present. Any value that already exists on the document will also be retained.

        Note

        Selecting Merge has the potential to change your documents’ metadata from what the producing party provided.

    • Timezone: You can choose a timezone to be used for extracted date metadata without a timezone specified. If you leave the field empty, Everlaw will use UTC. You can also select No Timezone if you don't know the time zone.
    • OCR Language and Page Size will follow the same rules as that of configuring native uploads. See the Advanced details section of the article on native uploads to learn more about these options.
  8. [Database Admins] [Optional] To force OCR on documents that are not OCRed by default upon reprocessing, select Force OCR, then select your option.
    • Default:  This OCRs all documents that are not OCRed by default upon reprocessing.
      You do not need to Force OCR to generate text for TIFFs, PDFs, or image files without embedded text, as the reprocessing steps outlined above will generate or replace the text for these document types. 
    • Flatten PDFs: Everlaw removes the embedded text layer from the PDF, then OCRs the PDF and replaces the embedded text with the OCR results. This is only recommended for PDFs that have invalid embedded text. This is only recommended for PDFs that have invalid embedded text.

      Important

       This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without choosing Flatten PDFs.

    • OCR optimized for English handwriting: Run OCR using an OCR tool that captures handwriting in English. This can be a good option if your documents have handwriting that wasn't already captured via OCR or captured in the provided text files. There are some important details to be aware of with this option:
      • This option is optimized for handwriting only in English and not in any other languages. You should not use this option for documents with non-English handwriting.
      • For typed text, this option only works for English, German, French, Spanish, Italian, or Portuguese.

        Important

        Using this option for documents in a language other than the ones listed here can result in garbled, inaccurate text or no selectable text for highlights or redaction using the text selector tool.

      • The limit for this tool is 100 documents per action. If you have more than 100 documents selected in the results table, this option is disabled.
      • This option works for PDF, PNG, JPEG, and TIFF formats
      • Documents reprocessed with this option selected will be searchable using the Handwriting metadata field. Those with handwriting detected will have a Yes value, and those without handwriting detected will have a No value. Documents you don't reprocess with this option not have a value for this field. Learn more about searching metadata in our Searching Metadata article.

        Important

        For non-Everlaw processed documents, you must select Merge metadata (described in step 7, above) to enable and extract a value for the Handwriting field.

    • Flatten PDFs and OCR optimized for English handwriting: This option removes the embedded text layer of the PDF, then does OCR using a tool that is optimized for handwriting in English. It is only recommended for PDFs with invalid embedded text. The bullet points for the above OCR optimized for English handwriting option also apply, other than the one about file types: this option only works for PDFs.

      Important

      This process is destructive of the embedded text layer and vector information in the existing image. For documents with a native PDF, the existing image on the platform is replaced, but the original vector information and embedded text is unchanged in the native file. You can recover this information in the image if you reprocess the document again without choosing an option that does Flattens PDFs.

  9. [Organization Admins only] [Optional] To include documents that have highlights and/or redactions on them in the reprocessing task, select Include docs with highlights/redactions. 

    Important

    If there are changes to the images made during the reprocessing, existing highlights and redactions may "shift" in relation to their original positions.

  10. If you want to ignore natives during reprocessing, select Ignore natives. This overrides the behavior to reprocess documents off their native, if available, and instead will reprocess off the image file. When you use this option, you can preserve the Bates stamp on documents that have native files included when you reprocess to embed text.
  11. When you have configured the settings appropriately, click Reprocess.
  12. Confirm any warnings, then select Process. This starts the task to reprocess your documents. Your originally uploaded documents will be replaced by the reprocessed ones.

Note

Some text...

Note

Codes and annotations are not changed or removed during reprocessing.

Monitor the progress of a reprocessing task

Once you start  the reprocessing task, a pop-up notification indicates the task has been queued. Another notification pops up when the task is complete.

To view the progress of the processing job, go to Data Transfer arrows-data-transfer.png> Uploads

For non-Everlaw processed data, the processing job is represented by a card within the Processing Jobs tab on the lefthand side. processing_card.png

For Everlaw-processed data, the progress is tracked on the original Native upload card(s) of the reprocessed documents.

You can read more about how to upload documents, and how to re-configure your upload, in our Uploading Documents article.

Find reprocessed documents later