Standard Format for Processed Data

On Everlaw, you can upload documents sets with a load file or produced (processed) PDFs without a load file. This article describes:

  • The standard format to successfully load processed data with a load file onto Everlaw
  • How to format processed PDFs without a load file. 

Although not all processed data uploads will exactly follow the structure outlined below, this article describes what is considered the standard format. Understanding the standard format will help you make sure your data is properly formatted to upload smoothly and without errors.

You can use this article in two main ways:

  • If you are new to loading processed data, this article can help you understand the overall format your data should have to make sure it is ready to load. Once you understand the structure of processed data, you should be ready to upload it onto Everlaw.
  • If you have already tried loading your data and encountered errors, this article can help you understand if the errors are a result of your data's formatting, and show you how your data should be formatted to upload smoothly.

Table of contents

What is processed data?

Processed data is data (e.g. documents) that has been processed or produced by another system prior to being uploaded onto Everlaw. Often, processed data is either a data set assigned a Control Number or it is a production with Bates-stamp sent to you by the opposing or outside counsel. In both situations, processed data is often accompanied by a load file. The load file contains identifying information to load the documents and their metadata into an ediscovery system.

Standard format for processed data with a load file

Typically, processed data with a load file will have separate folders for each type of file within it. 

The folders typically included in processed data are:

  • Data: Contains the load file(s)
  • Images: Contains imaged files of your documents
  • Native: Contains native files of your documents, if included
  • Text: Contains text files of your documents

Images folder

The Images folder contains image files of the documents listed in the load file. Typically, the Bates stamp and any redactions are "burned in" to the image file. 

To upload images onto Everlaw, the following criteria must be met:

  • Image files can be multi-page PDFs, single and multi-page TIFFs, PNGs, or JPEGs
    • Page-separated PDFs are not accepted 
  • The names of the image files must exactly match the BeginBates or BeginID of the entry in the load file 

Natives folder

The Natives folder contains any native (original format) files with entries in the Native Path field of the load file. Processed data sets do not always include the native file for every document, and some data sets might not include any native files at all.  

Text folder

The Text folder contains the text files of documents listed in the load file. Your text files should meet the following requirements:

  • Everlaw accepts one standard TXT text file for each document listed in the load file
    • Everlaw does not accept Rich-Text Files (RTFs)
  • If provided in the load file, Text Path is used to locate the text file. You can read more about the Text Path in the section on load files.
  • If a Text Path is not provided in the load file, the TXT filenames must exactly match the BeginBates or BeginID in the load file 

Data folder

The data folder contains the load file(s). A load file contains metadata and identifying information about a set of documents to upload to an ediscovery platform (like Everlaw). The load file is usually provided as part of a production, migration, or processed upload. 

If you encounter any issues during your upload, a good first investigative step is to examine your load file to check that it meets the following requirements. 

The following sections describe the standard format for load files, as well as information that can be included within it.

Accepted load file formats in Everlaw 

  • DAT files (most common)
  • Comma-separated values (CSV)
  • TXT files
  • OPT files (for images)
    • An OPT file cannot be used on its own as a load file, but it can be helpful if your load file does not contain either number of pages or End Bates or when Everlaw cannot determine the name of all pages

Load file structure

A load file must be structured to meet the following criteria:

  • The metadata field names (headers) must be the first line of the load file. 
  • There must be a consistent separator character between columns in the loadfile. 
  • Each line must represent exactly one document
  • Each load file must contain two identifying fields. These fields are used to identify where each separate document begins and ends. The two required fields are:
    •  A unique identifier that represents the beginning of a document
      • Common names for this field are Begin Bates or Control Number 
    • Either a field representing the end of the document or a field identifying the number of pages in that document
      • While only one of these two fields is required, having both is preferred 
      • Common fields to identify the end of a document are EndDoc or End Bates 
      • Common fields to identify the number of pages are NumPages or Page Count 

These are two options for the required identity fields

  • The load file should not contain the document text embedded as a field, which can create problems with upload. These fields might be called  “Extracted Text” or “OCR Text”. The text for each document should be in a unique multi-page text (.txt) file named after the Document ID (Begin Bates or Control number) of the document.
  • The load file should not contain work product, such as codes, or ratings. If this data is being migrated from another platform, work product should be included in a separate file. Please see our article on data migrations to learn more.

Bates prefixes

Processed data is loaded into Everlaw with a Bates prefix and a number, which together comprise a unique identifier for each document, often referred to as a document's Bates number.

The following criteria describe the format for  Bates prefixes loaded into Everlaw:

  • Bates prefixes may have alphanumeric prefixes containing any combination of letters and numbers. Prefixes may also contain hyphens, underscores, and spaces.
Bates prefix structure Examples
Any combination of numbers and letters
  • 1ABC
  • ADMIN
  • Doe
Underscores
  • 1_ABC
  • Admin_Data
Hyphens
  • 1-ABC
  • ADMIN-Data
Spaces
  • 1 ABC
  • Jane Doe
  • The character limit for Bates prefixes is 36 characters
  • Bates prefixes that end in a number (e.g. ABC1) can be uploaded, but they create ambiguity as to where the Bates prefix ends and the Bates number begins. Everlaw's automatic Bates prefix detector may not precisely identify the end of the Bates prefix.
    • There is a step in the upload process that allows you to manually add Bates prefixes that end in a number
  • Bates prefixes that contain a space may not be automatically recognized upon upload 
    • There is a step in the upload process that allows you manually add Bates prefixes with spaces
  • Bates prefixes cannot contain brackets, slashes, and curly-brace characters. Prefixes with these characters cannot be uploaded onto Everlaw.
  • In a typical load file, both BeginBates and EndBates (if included) should follow the same prefix and numbering convention. If they do not, your data may require extra manipulation during upload.
  • Please note that suffixes cannot be used to differentiate documents in Everlaw. Bates numbers  ABC001_001 and ABC001_002 will be identified as two pages of the same document

If you have questions, please contact support@everlaw.com.

Text file information in the load file

To upload text files:

  • If the names of your text files do not exactly match their BeginBates, your load file must contain a text path field to load the text files
    • This field is often called "Text Link" or "OCR Path"
    • For example, text files with a designation appended to their Begin Bates such as ABC001_Confidential.txt must have a text path field to be uploaded

Native file information in the load file

To upload native files: 

  • A native path or unique filename must be included as a field in the load file.
    • For example, “file.xls” is an acceptable native path entry, if there are no other files with that exact name. Two files with the same name can be uploaded as long as they are in separate folders and their native paths are included in the load file. 

Processed PDFs without a load file

Any set of processed PDFs can be loaded using the PDF without a load file option

  • All PDFs must be contained within a single, unzipped outer folder. The folder structure of your upload will be recorded upon upload.
  • If your documents have been Bates stamped and you want to preserve this Bates numbering on Everlaw, the filename of each document must match the Bates stamp burned in on the document. 
      • If Everlaw cannot detect a prefix and number from the PDF filenames, the documents will be uploaded with the default prefix EVER.
      • If there is information in the PDF filename, that information will not be retained as part of the Bates prefix. Upon upload, the original filename, with the additional information, will be retained in the document's filename metadata.
      • If the Bates numbers include suffixing, such as ABC001_001, you will need a load file to upload them.

If you need support in uploading your processed data onto Everlaw, you can transfer the files to our support team.

 

Have more questions? Submit a request

0 Comments

Article is closed for comments.