Standard Format for Processed Data

On Everlaw, you can upload documents sets with a load file, PDFs without a load file, or submit files to the Everlaw team to upload as processed data. This article will go through the standard format for uploading processed data on Everlaw.

Load file requirements

The required format for processed data is relatively inflexible, please make sure that your processed data sets conform to the below Everlaw standards:

  • The load file can be any text-based file with consistent delimiters. Common formats include: Comma-separated values (CSV) files, DAT files, and TXT files
  • Each line in the load file must represent exactly one document
  • The load file must include a field with a unique identifier representing the beginning of a document (such as Control Number or Begin Bates). It must also include either a field representing the end of the document (such as EndDoc or EndBates) or a number of pages field (such as NumPages or Page Count).

pasted_image_0.png

  • Both BeginBates and EndBates (if included) numbers should follow the same numbering convention.
  • Document identifiers may have alphanumeric prefixes containing any combination of letters and numbers. Prefixes may also contain hyphens, underscores, and spaces.
  • For best results Bates prefixes should not end in a number, as this creates ambiguity as to where the Bates prefix ends and the Bates number begins. Bates prefixes also should not terminate in a space character. Brackets, slashes, and curly-brace characters are not permitted in a Bates prefix.
  • Records that just have Bates numbers (e.g. 00002844) are also allowed on Everlaw. Note that when submitting processed data containing documents with no Bates prefix, you must reach the identity fields stage of the processed uploads wizard, click the pencil icon on the left hand side to open the “Edit Bates Prefixes” configuration tool, and then check the “Allow Bates with empty prefix” box to allow the system to recognize numbers with no prefix as Bates numbers.
  • The names of the image and text files should exactly match the BeginBates or BeginID of the entry in the loadfile. For cases where this is not possible, however, there are other methods you can use to successfully upload your data: 
    • For text files, if there is a column called Text Path (also referred to as Text Link or OCR Path) in the loadfile, then this column is selected by default, and the provided Text Path is used to locate the text file. If a Text Path is not provided, the system will look for an exact match of the BeginBates or BeginID as normal. Note that since the Text Path column is optional, you can move onto the next step of the uploader without selecting it. 
    • For image files, you can make the system prompt you for an OPT file, which contains Image Path information that can be used to locate the image file. To do this, you can deselect the End Bates and Page Number columns in the loadfile stage, and this will cause the system to prompt you for an OPT file on the next page. You can upload the OPT file, and the system will use the Image Paths present in the OPT file instead of looking for an exact match of the image file name.
    • These methods described above are to provide for cases where you may have data with a confidentiality designation like “_CONFIDENTIAL” affixed to the file names. Using Text Paths and/or Image Paths allows you to upload your data without requiring exact file name matching.  
  • For uploading native files, a native path or unique filename must be provided. For example, “file.xls” is an acceptable native path entry if there are no other files with that exact name.
  • The loadfile should not contain the document text embedded as a field, which can create problems with upload. Often this happens for fields named “Extracted Text” or “OCR Text”. The text for each document should be in a unique multi-page text (.txt) file named after the Begin Bates of the document.
  • There is a consistent separator character between columns in the loadfile.
  • The loadfile should only contain document metadata information. All work product that needs to be migrated (such as codes, ratings, etc.) should be included in a separate file.

Image and text

In terms of images, Everlaw’s processed data uploader supports TIFF (both single- and multi-page), PNG, and JPEG or multi-page PDFs. Page-separated PDFs are not accepted. For text files, the uploader supports .TXT files but not .RTF files.

PDFs without a load file

If you want to retain the Bates number of processed PDFs upon upload, take care to name each document with its unique BeginBates number before uploading the files. A document's Bates stamp will not influence the name of the document on the platform, so make sure the documents are named properly before uploading. 

If Everlaw cannot detect a prefix and number from the PDF filenames, the documents will be uploaded with the default prefix EVER.

Please note that if there is extra information in the PDF filename, that information will not be retained.

 

Have more questions? Submit a request

0 Comments

Article is closed for comments.