Organization Admin: Uploading Processed Data

This article covers organization admin functionalities, which are available to certain users with contracts covering unlimited database creation. 

Table of Contents

Return to table of contents

 

For a video on uploading processed data, please see here:

Overview

Uploading structured, processed data such as productions into Everlaw is a complex process with many potential sources of error. Because of this, you must attend a directed training before attempting to upload documents. Once you have received this training, you will be able to upload production sets directly into Everlaw without having to wait for support availability. To schedule a training, please email training@everlaw.com or contact your account representative.

Please note that this feature only works in the Chrome web browser, so be sure to follow this process in Chrome.

Return to table of contents

Required format

The required format for uploads is relatively inflexible, so make sure that your documents conform to this standard before uploading.

Loadfile Requirements

Each document represents one line in the loadfile.

  • The document’s name matches exactly the name of the entry in the loadfile (e.g., ABCD00002.pdf if the document in the loadfile says ABCD00002). This must be true down to the number of zeros and the presence or absence of a separator character, such as an underscore (e.g., ABCD00003_001 must be the name of both the PDF and the loadfile entry).
    • Please note that this requirement for identical names holds for both text and image files. For uploading native files, a native path must be provided.
    • The filename of the native does not need to be identical to the name of the text/image file as long as it exactly matches the value in the native path. Everlaw does not use text or image path values.
      • Please note that a full path to the native file isn’t necessary, but each native needs a unique filename or some path portion that is unique. For example, “file.xls” is an acceptable native path entry if there are no other files with that exact name.
  • There is a consistent separator character between columns in the loadfile.

Accepted formats include .DAT, .TXT, and .CSV, but other formats may be acceptable as long as there are consistent and accepted quote characters.

Image and text

In terms of images, Everlaw’s processed data uploader supports single-page TIFF, PNG, and JPEG or multi-page PDFs. Multi-page TIFF and single-page PDF are not accepted. For text files, the uploader supports .TXT files but not .RTF files.

 

Return to table of contents

Process


Initializing the upload

First, on the organization admin Processed Uploads tab, click “Add a new Upload.” Then, enter in the corresponding database name or select a database from the drop-down. This step determines which database your data gets uploaded into, so take care that you are selecting the right one. 


Next, drag and drop the unzipped folder containing the loadfile and image/text/native files into the uploader. Then, select the name of the user authorizing the upload. This is a safety check to make sure that the documents are being uploaded to the correct database, and can also be used as an audit trail to see who has provided and/or approved various uploads. If there is no user who has requested the upload, you may override this step by providing an authorizing organization or a free form note. 


If you would like to give the upload a different name than the name of the folder you dragged in, you can change the name now. The uploader automatically adds the date, but you can remove it if you would like. 


Click “Upload New Docs” if you are uploading a new set. If this is a modification of documents that are already on the platform, please go to the article on modification uploads. Then, select the loadfile. This will often be listed under a folder called “Data.” If there are multiple loadfiles present, it tends to be the case that the .DAT file most closely conforms to the uploader’s requirements. Otherwise, you can select a .CSV or a .TXT file. .OPT files will not work to ingest data into Everlaw. Then, select Upload Loadfile.

 processed_upload_1_pls.gif

 

Return to table of contents

Special columns

The next step involves selecting system metadata fields. These are the special columns that Everlaw will use to identify where documents begin and end, as well as find the native documents. The fields are listed below:

  • Begin Bates
    • This field is always required.
    • This is designated by selecting “B” next to the metadata field that will be used.
    • It may be called “BeginBates,” “BegDoc,” “ProdBegBates,” or any of several other potential options.
    • The uploader may accidentally pick up a different Bates value (such as ParentBates), which must be corrected.
  • End Bates
    • Select “E” next to the field that will be used for this value.
    • Either this field or “Page Count” are required so that the system knows where the end of the document is. End Bates is preferred over Page Count because it is less error prone.
    • This field may have multiple potential names, like “EndBates” or “EndDoc.”
  • Page Count
    • If there is no end Bates value, Page Count can be used to determine the end of documents. In that case, select “#” next to this column. You can also select both page count and end Bates. If the uploader prompts you for an .OPT file, that is an indication that the two values do not match up.
  • Native Link
    • Select "N" next to the metadata field that will be used to identify the documents' native files.
    • This field will often be called "Native Link" or "Doc Link" and will contain text that looks like a file path and includes the word "Native", e.g., ENRON PROD009\ENRON PROD009\VOL0001\NATIVES
    • This field is necessary if you would like to upload native files, even if those files have identical names to the image.

 

In this step, you can also select to ignore certain metadata fields, like “Page Count” as mentioned above. The uploader will automatically ignore all fields that have no values for any of the documents. To ignore a field, select “X.”

 

Make sure that you always ignore the text path field, which could be called “Text Link” or “OCR Path.” This is because Everlaw doesn’t use text path to upload text, but the value on its own has little purpose.

processed_uploads_2_pls.gif

If you are not able to assign the fields you want to in this step, it likely is because there is something wrong with the formatting in the loadfile. You can find out what the problem is by hovering over the header to see why the uploader does not think that the value can be used as that special column.

Note on overlapping documents
If you get an error after the previous step letting you know that the documents already exist on the platform, compare the number of documents that show the error to the number of lines in the loadfile. You can export a CSV of the errors to easily tally up the lines in Excel by clicking on the red circle with an exclamation point in it.

overlap_csv.gif

If the number of overlapping documents and the number of documents in the loadfile are equal, then all of the documents in the loadfile are already on the platform and you should consider doing a modify upload or skipping the upload entirely, as docs with the exact same begin and end Bates are already up. This indicates the upload might be a modification, but you should always reference the actual documents first. 


Otherwise, if there are fewer documents in the error CSV than there are in the loadfile, that means the loadfile is a mixture of documents already on the system and new documents. In this case, you need to remove the lines in the loadfile representing documents already on the platform in order to upload the new documents. You can copy the lines of the overlapping documents into a new document if you would like to do a modify upload after. 

 

Return to table of contents

Metadata parsing

In this step, you determine what format the uploader parses each metadata field as. For example, if there is a “File Size” field with contents that are just numbers, you can choose whether the uploader should understand it as a text string or a numerical value. If you select the latter, you will be able to do searches on number ranges (e.g., a search for file size larger than 100). The uploader will only allow you to select a field as numerical if there are only numbers and no letters in any of the values. For instance, a value of “100kb” would prevent the uploader from allowing you to select the field as numerical. Of particular importance in this step is making sure that the uploader recognizes date fields as dates, not as text. Otherwise, you won’t be able to do date range searches.

 processed_uploads_3.png

 

The different colors on each metadata field reflect whether the uploader is confident with the designation of field type.

  • Green: The uploader is confident that the field is parsed correctly; you usually do not need to click through these.
  • Yellow: This color specifically means that the field has more than one non-text format option, such as numerical and date. Usually these are correctly parsed, but it’s advisable to click through them to make sure the data seems to be in the proper format. Once you have clicked on a field, its color will change to a more muted hue. Often, date and time fields will be flagged here and may need to be reformatted (see below). Sometimes fields containing numerical values, e.g., "Attachment Count,” may need to be changed from text to number format.
  • Red: This means there has been an error. There is at least one value that cannot be parsed. To select this format, you will either have to create a new date field as described below, or mark the values that do not conform to the specified format as “Null.” This will ignore that particular value and will allow the uploader to move to the next step.

 

In the case that a DateTime field is being parsed incorrectly, this often indicates that there are multiple different date formats present in a single loadfile. The date values that do not parse with the current format(s) will be displayed. To fix the inconsistent date values, you will need to merge the date format the uploader has identified with the format or formats present in other rows. To merge them, create a new format by clicking “Merge With...” Fill in the box with the format that matches the date fields that are not being correctly parsed according to the instructions that appear. Then, select that format from the drop-down and select Merge. You may have to do this multiple times to account for all of the potential date formats present in one upload.

merge_date.gif

You can also choose to ignore a column in this step by selecting the relevant green, yellow, or red column button and clicking on the gray Ignore Column button in the upper right. 

 

Return to table of contents

Combine 

In this step, you can combine date and time fields so that they are present as one field instead of two. For example “Date Created” and “Time Created” could be combined into one field. The uploader generally does this automatically, in which case the combined fields are shown as green. If you would like to split them anyway, select “Split” after clicking on the values. If the uploader is not confident, they will appear yellow. 

 

If the field cannot be combined with anything, it will be grey. If fields that seem like they should be combined are not, you may need to return to the previous section, create new formats, and merge those formats.

 

Return to table of contents

Normalize

In this step, you match the metadata fields in the loadfile (present on the right side of the table) with the metadata field that will be displayed in Everlaw. There are two types of metadata fields on the Everlaw platform: canonical and semantic fields.

Canonical fields are the curated list of fields we have enumerated to try to limit the proliferation of redundant metadata fields. For example, if multiple parties are producing documents that are present in your database, you could have fields called “Begin Family,” “BeginAttach,” “Begin Attachment,” and “BegFamily” that all actually reference the same underlying field. To facilitate searching all the fields at once, as well as making the data in your database easier to read, you should pick Everlaw’s canonical “Begin Family” field if any of those are present. You can tell if a field is canonical because it will say “Canonical” in blue on the left side.

Semantic fields are fields that are not present in Everlaw’s canonical list. In general, you want to map all fields to canonical fields to the degree possible. In other words, you want as many rows as possible to be blue in your table. This ensures that your database remains easy to search and metadata fields are easy to understand.

There are some fields that do not have a good canonical equivalent, however, such as “Has Redactions” or “Redacted,” though you will still want to preserve them in the database. In those cases, you can select a semantic field. To avoid all of the problems with using multiple semantic fields, you should make sure that you are using the same semantic field each time. This means using “Redacted” consistently instead of alternating between that and “Has Redactions.” This is aided by the fact that semantic fields you have previously used will say “Semantic” in grey on the left side, where new fields will say “New” in yellow.

normalization.gif

One important thing to remember about this section is that though it does display example metadata values in rows and columns, the documents aren’t necessarily the same across rows. That means that the entry in the third column of the “Custodian” field is not necessarily the  same document as the one in the third column of the “Duplicate Custodian” field, even though they are right on top of each other. The best way to verify what metadata values correspond to a particular document is always to check the loadfile.

 

Return to table of contents

Resolve

In this step, you resolve inconsistencies among the metadata fields you mapped in the previous field. For example, if you have mapped multiple loadfile fields to the same metadata field in Everlaw, but the loadfile document does not have all identical values, a conflict will appear and the box will be red. You can choose to go back to the previous step to map them to different metadata fields, save both values, or delete the conflicting values.

normalize_same_name.gifGoing back to previous step to map loadfile fields to different Everlaw metadata fields.

If you choose to save the conflicting values, the values on the left, represented by a green column, will be the ones shown on the document. The values with the tan background will be present when hovering your mouse over the field you ultimately choose to save in the metadata panel, and when doing searches on that value. If you choose to delete the conflicting value, it will not be present at all in the database.

Resolve_Save.png

Saving conflicting fields.

 

Resolve_Delete.png

Deleting conflicting fields.

 

If the fields have many of the same values, but some values are only present in one field but not the other, the field will be yellow. In many cases, this means you can combine them, as they reflect the same data. In other cases, it could represent that one field is a superset of another - for example, “Custodian” and “All Custodians” will have a good deal of overlap, but should not necessarily be combined.

 

You will only be able to move on to the next step once you have resolved all issues.

 

Return to table of contents

Review

There is no action you need to complete as part of this step. It gives you a summary of any potentially problematic fields. This includes any conflicts, new fields that are being added, or semantic (non-canonical) fields being added to the database. If you would like to change any of these fields, go back to the previous steps and adjust them.

 

 Screen_Shot_2018-01-05_at_5.07.50_PM.png

 

Return to table of contents

Select folders

In this step, you select the folders that you would like to be used to upload each type of document: image, text, and, if a native link is present, native. First, select the directory or directories where the images can be found, then hit “Next: Text.” Do the same for text native. You can expand folders by clicking on them twice.

select.gif

Please note that the uploader will give you an error if you point it to two files with the same name. This could happen if there are native PDFs or other image files in the native directory and you pick a top level directory that includes both the images and the native directory. The solution to this is to make sure that images and native documents are in separate folders and to select only those folders during this step.

 

Return to table of contents

Verify files

In this step, the uploader will first go through all of the files and verify that they are present. If any are missing, it will prompt you to continue anyway or stop the upload. If you continue anyway, you can download a list of missing files at the end. Then, the upload will occur, and you can watch as the system goes through all of the files. As the upload goes on, documents are indexed as they are ingested into the system. That means you will begin to see documents in the database before the upload is complete. Once the upload has completed, you can open a search of the documents in the results table by clicking on the magnifying glass icon on the right of the documents.

Screen_Shot_2018-01-05_at_5.15.28_PM.png

You can also delete the upload by clicking on the trashcan. This will remove it from the database, not just the uploads table.

 

Return to table of contents

OPT Files

An OPT file is an explicit listing of the name for each page of a document, including the filename of the image for that page. It is used to determine the names of image files when the name cannot be determined implicitly from the document’s Bates range and page count.

An OPT file is required whenever there is an inconsistency between the number of pages indicated by the Page Count column and the implicit number of pages described by the Bates range. For example, loadfile record ABCD001-ABCD010 has a page count of 12 whereas the count implied by the Bates range is 10. An OPT file will be required to rectify the inconsistency. There are several possible reasons for the apparent or actual inconsistency:

  • The Page Count column is inaccurate (rather than provide an OPT file you could ignore the Page Count column).
  • Some pages have non-obvious names, e.g., the pages go ABCD001, ABCD002, ABCD002.2, ABCD002.3, ABCD003, ABCD004, … ABCD010. In this case the Bates range is correct and the page count of 12 is correct. The OPT file is needed to determine the name for each image. This problem may not be obvious without inspecting the images.

If the Page Count is correct but the Bates range is incorrect, an OPT file won’t fix the problem. You will need to correct the Bates range in the loadfile.

Another case requiring an OPT file is whenever the names of a document’s images cannot be inferred from the Bates range. For example:

  • ABCD001-ABCD004.4 (some pages use a page separator, others don’t)

OPT files are currently not supported for uploads using PDFs as the image format.

For common errors when uploading documents, please see our help documentation.

Return to table of contents

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.