Organization Admin: Uploading Processed Data

This article covers organization admin functionalities, which are available to certain users with contracts covering unlimited database creation. 

Table of Contents

Return to table of contents

Overview

Uploading structured, processed data such as productions into Everlaw is a complex process with many potential sources of error. Because of this, you must attend a directed training before attempting to upload documents. Once you have received this training, you will be able to upload production sets directly into Everlaw without having to wait for support availability. To schedule a training, please email training@everlaw.com or contact your account representative.

Please note that this feature only works in the Chrome web browser, so be sure to follow this process in Chrome.

You can find the Processed Uploads tab on the left-hand side of your Organization Administration page. You may also find the Processed Uploads tab under the Data Transfers page within any complete project where you have database upload permissions. The page is divided into  three tabs: ‘In progress’, ‘Complete’, and ‘Deletion in progress’. Each tab is populated with upload cards. Each card represents a single upload to a single database. In the image below you can see the upload name highlighted in red, the database name in blue, and the type of upload in green. Please note that when viewing processed upload cards from within a project, the database name will not be listed.

mceclip0.png

At the top of the page are various filters you can use to look at uploads by database, upload name, status, and date. Filters only apply to uploads that have been loaded on the page.  You can use the “Load more” button at the top or bottom of the cards to load more cards.

mceclip1.png

A Note on Overlays

Everlaw allows you to re-upload documents that are already on the platform to replace images, text, natives, and/or metadata, as well as to add additional metadata information or fill in missing files (as in the case where some single page TIFFs were initially missing in a document). This maintains any review product (such as codes, notes, and ratings) on the documents. One use case is when the other side has provided replacement documents that they forgot to redact initially. You can replace the images and text with the new, redacted copies but maintain any notes you have applied to the document. Overlays cannot be deleted or undone. Uploads that contain new documents and overlays will be referred to as mixed uploads.

Return to table of contents

Required format

The required format for uploads is relatively inflexible, so make sure that your documents conform to this standard before uploading. Unlike uploading processed data via the Uploads page to individual projects (for which either a Bates stamp or loadfile is required), documents uploaded via the processed uploader on the Organization Administrator page must have a loadfile. Please read our article on standard formats for processed data for additional information. 

 

Return to table of contents

Process


Initializing the upload

First, on the organization admin Processed Uploads tab, click “+ New processed upload.” This will open an overlay where you will configure your upload. Then select a database from the drop-down. This step determines which database your data gets uploaded into, so take care that you are selecting the right one. 


Next, drag and drop the unzipped folder containing the loadfile and image/text/native files into the uploader. Then, select the name of the user authorizing the upload. This is a safety check to make sure that the documents are being uploaded to the correct database, and can also be used as an audit trail to see who has provided and/or approved various uploads. If there is no user who has requested the upload, you may override this step by providing an authorizing organization or a free form note. 


If you would like to give the upload a different name than the name of the folder you dragged in, you can change the name now.


Next, select the load file. This will often be listed under a folder called “Data.” If there are multiple load files present, it tends to be the case that the .DAT file most closely conforms to the uploader’s requirements. Otherwise, you can select a .CSV or a .TXT file. .OPT files will not work to ingest data into Everlaw. Then, select Upload Loadfile.

Initialize_processedUpload.gif

Everlaw will report errors in parsing your load file, and you will have the option to proceed with your upload without uploading these problematic documents. Please note that selecting Continue anyway will NOT upload all of the documents in your processed upload, and will skip documents associated with the skipped lines.

1961.png

To ensure that all of your documents get uploaded, you can troubleshoot your load file using the error reports that can be downloaded by clicking Download errors.

errorCSV.gif

You can proceed in one of the following ways: 

Download the error report by clicking Download errors, proceed with uploading the documents that were able to be processed, fix the “load-error-lines.DAT” load file found in the error report, and upload the rest of your documents that were skipped during the first upload, 

or,

Download the error report, fix any errors in your original load file, Click Try a different load file, and upload all of your documents in one upload.

 

If you would like information on troubleshooting syntax errors in your load file, read this article about processed data errors.

Return to table of contents

Special columns

The next step involves selecting system metadata fields. These are the special columns that Everlaw will use to identify where documents begin and end, as well as find the native documents. The fields are listed below:

  • Begin Bates
    • This field is always required.
    • This is designated by selecting “B” next to the metadata field that will be used.
    • It may be called “BeginBates,” “BegDoc,” “ProdBegBates,” or any of several other potential options.
    • The uploader may accidentally pick up a different Bates value (such as ParentBates), which must be corrected.
  • End Bates
    • Select “E” next to the field that will be used for this value.
    • Either this field or “Page Count” are required so that the system knows where the end of the document is. End Bates is preferred over Page Count because it is less error prone.
    • This field may have multiple potential names, like “EndBates” or “EndDoc.”
  • Page Count
    • If there is no end Bates value, Page Count can be used to determine the end of documents. In that case, select “#” next to this column. You can also select both page count and end Bates. If the uploader prompts you for an .OPT file, that is an indication that the two values do not match up.
  • Native Path
    • Select "N" next to the metadata field that will be used to identify the documents' native files.
    • This field will often be called "Native Link" or "Doc Link" and will contain text that looks like a file path and includes the word "Native", e.g., ENRON PROD009\ENRON PROD009\VOL0001\NATIVES
    • This field is necessary if you would like to upload native files, even if those files have identical names to the image.

 

In this step, you can also select to ignore certain metadata fields, like “Page Count” as mentioned above. The uploader will automatically ignore all fields that have no values for any of the documents. To ignore a field, select “X.”

 

Make sure that you always ignore the text path field, which could be called “Text Link” or “OCR Path.” This is because Everlaw doesn’t use text path to upload text, and the value on its own has little purpose.

processed_uploads_2_pls.gif

If you are not able to assign the fields you want to in this step, it likely is because there is something wrong with the formatting in the loadfile. You can find out what the problem is by hovering over the header to see why the uploader does not think that the value can be used as that special column.

Everlaw will report inconsistencies in your special columns, and you will have the option to proceed with your upload without uploading these problematic documents. Please note that selecting Continue anyway will NOT upload all of the documents in your processed upload, and will skip documents associated with the skipped lines.

1962.png

To ensure that all of your documents get uploaded, you can troubleshoot your load file using the error reports that can be downloaded by clicking Download errors.

uploaderErrors.gif

Again, you can proceed in one of the following ways: 

Download the error report by clicking Download errors, proceed with uploading the documents that were able to be processed, fix the “load-error-lines.DAT” load file found in the error report, and upload the rest of your documents that were skipped during the first upload, 

or,

Download the error report, fix any errors in your original load file, Click Try a different load file, and upload all of your documents in one upload. 

 

If you would like information on troubleshooting malformed information errors in your load file, read this article about processed data errors.

Return to table of contents.

Deleting field values

There is one operation under defining Special Columns that is only relevant when performing an overlay, deleting metadata values. If you would like to remove a metadata value from all the documents listed in your loadfile, you can use a loadfile with no values in that field. Though the platform defaults to ignoring empty fields, you can tell the platform that you’d rather delete it by clicking the red X next to the associated field. It will then show up in “To Delete” on the left-hand side of the uploader.

article_1_ss_1.png

article_1_ss_2.png

You will be asked to map the loadfile field to a metadata field in your database and to confirm the deletion at a later step.

article_1_ss_3.png

Please note that if you elect to delete metadata values, the previous entries will not be available on the site or recoverable by Everlaw. Only select this option if you are sure you would like to remove all metadata values in a given field for the documents in your loadfile.

Return to table of contents

 

Overlays and Mixed Uploads

After defining Special Columns, Everlaw will read your loadfile and determine the documents in your upload. If your upload contains only documents that already exist on the platform, you will be asked what formats of the documents you would like to overlay. Natives will only be an available option if a Native Path column was identified in the previous step.

mceclip0.png

If your upload contains both documents that already exist in your database and new documents, you will be prompted with some additional options. In this screen you can choose only to upload the new documents found in the loadfile, only overlay the existing documents found, or to do both. 

mceclip1.png

If your choice will overlay documents, then you will also be asked to determine which formats of the documents you would like to overlay. Regardless of your choice, Everlaw will automatically split the new and existing documents from your loadfile into two new loadfiles which you can download from the ‘Loadfiles and Split Info’ button associated with the upload. These downloads will also be available under the three-dot icon once the upload is complete.

Note on overlapping documents

If you get an error after the previous step letting you know there are overlapping documents, that means the upload contains entries that would overwrite several documents. You can export a CSV of the conflicts by clicking on the red circle with an exclamation point in it.

Overlays can change the page length of documents, but they must continue to have a unique Bates number for each page. Please review the overlay to confirm that this is the case before attempting the upload again.

ProcessedUpload_overlaperrordownload.gif

Return to table of contents

Metadata parsing

In this step, you determine what data type the uploader parses each metadata field as. For example, if there is a “File Size” field with contents that are just numbers, you can choose whether the uploader should understand it as a text string or a numerical value. If you select the latter, you will be able to do searches on number ranges (e.g., a search for file size larger than 100). The uploader will only allow you to select a field as numerical if there are only numbers and no letters in any of the values. For instance, a value of “100kb” would prevent the uploader from allowing you to select the field as numerical. Of particular importance in this step are

  1. Making sure that the uploader recognizes date fields as dates, not as text. Otherwise, you won’t be able to do date range searches.
  2. Listing the To, From, and CC fields as AddressList type and the From field as AddressFrom. This will enable you to use advanced conversational search functionality.

mceclip1.png

 

The different colors on each metadata field reflect whether the uploader is confident with the designation of field type.

  • Green: The uploader is confident that the field is parsed correctly; you usually do not need to click through these.
  • Yellow: This color specifically means that the field has more than one non-text format option, such as number and datetime. Usually these are correctly parsed, but it’s advisable to review them to make sure the data seems to be in the proper format. Once you have clicked on a field, its color will change to a more muted hue. Sometimes fields containing numerical values, e.g., "Attachment Count,” may need to be changed from text to number format.
  • Red: This means there has been an error. There is at least one value that cannot be parsed. To select this format, you will either have to create a new date field as described below, or mark the values that do not conform to the specified format as “Null.” This will ignore that particular value and will allow the uploader to move to the next step.

Parsing Datetime Fields

Everlaw will automatically detect and parse various datetime formats contained in your loadfile. Parsing the field takes the values and ensures that Everlaw correctly recognizes the date. For example, a date given like 2001/12/30 would be parsed such that the system recognizes the year as 2001, the month as December, and the day as the 30th. Since dates can be ambiguous, based on your login region Everlaw will preference either a Month-Day (US) or Day-Month (Non-US) ordering. This preference can be changed for a given upload by using the ‘Change preference’ button on the left-hand side of the metadata type selector.

mceclip2.png

 

 You can also view and edit the datetime formats Everlaw used to parse your datetime values by clicking on the “See/whitelist formats” button. This will present you with a dialog showing samples from your loadfile, the current format being used by Everlaw, and a sample result for what that returns. If you notice a date is incorrectly interpreted, you can select a new format in the ‘Choose format’ column. If you select a new format, clicking on the ‘Test format whitelist’ button will populate the ‘New result’ and ‘Error’ columns with the expected results. All samples must have valid results in order to confirm and reanalyze the entire loadfile. If the datetime format you would like to select does not appear in the list, please contact Everlaw Support to have it added to the list.

mceclip3.png

Return to table of contents

Combine 

In this step, you can combine date and time fields so that they are present as one field instead of two. For example “Date Created” and “Time Created” could be combined into one field. The uploader generally does this automatically, in which case the combined fields are shown as green. If you would like to split them anyway, select “Split” after clicking on the values. If the uploader is not confident, they will appear yellow. 

 

If the field cannot be combined with anything, it will be grey. If fields that seem like they should be combined are not, you may need to return to the previous section, create new formats, and merge those formats.

 

Return to table of contents

Normalize

In this step, you match the metadata fields in the loadfile (present on the right side of the table) with the metadata field that will be displayed in Everlaw. There are two types of metadata fields on the Everlaw platform: canonical and semantic fields.

Canonical fields are the curated list of fields we have enumerated to try to limit the proliferation of redundant metadata fields. For example, if multiple parties are producing documents that are present in your database, you could have fields called “Begin Family,” “BeginAttach,” “Begin Attachment,” and “BegFamily” that all actually reference the same underlying field. To facilitate searching all the fields at once, as well as making the data in your database easier to read, you should pick Everlaw’s canonical “Begin Family” field if any of those are present. You can tell if a field is canonical because it will say “Canonical” in blue on the left side.

Semantic fields are fields that are not present in Everlaw’s canonical list. In general, you want to map all fields to canonical fields to the degree possible. In other words, you want as many rows as possible to be blue in your table. This ensures that your database remains easy to search and metadata fields are easy to understand.

There are some fields that do not have a good canonical equivalent, however, such as “Has Redactions” or “Redacted,” though you will still want to preserve them in the database. In those cases, you can select a semantic field. To avoid all of the problems with using multiple semantic fields, you should make sure that you are using the same semantic field each time. This means using “Redacted” consistently instead of alternating between that and “Has Redactions.” This is aided by the fact that semantic fields you have previously used will say “Semantic” in grey on the left side, where new fields will say “New” in yellow.

normalization.gif

One important thing to remember about this section is that though it does display example metadata values in rows and columns, the documents aren’t necessarily the same across rows. That means that the entry in the third column of the “Custodian” field is not necessarily the  same document as the one in the third column of the “Duplicate Custodian” field, even though they are right on top of each other. The best way to verify what metadata values correspond to a particular document is always to check the loadfile.

 

Return to table of contents

Resolve

In this step, you resolve inconsistencies among the metadata fields you mapped in the previous field. For example, if you have mapped multiple loadfile fields to the same metadata field in Everlaw, but the loadfile document does not have all identical values, a conflict will appear and the box will be red. You can choose to go back to the previous step to map them to different metadata fields, save both values, or delete the conflicting values.

ProcessedUpload_resolveResolution.gif

Going back to previous step to map loadfile fields to different Everlaw metadata fields.

If you choose to save the conflicting values, the values on the left, represented by a green column, will be the ones shown on the document. The values with the tan background will be present when hovering your mouse over the field you ultimately choose to save in the metadata panel, and when doing searches on that value. If you choose to delete the conflicting value, it will not be present at all in the database.

Resolve_Save.png

Saving conflicting fields.

 

Resolve_Delete.png

Deleting conflicting fields.

 

If the fields have many of the same values, but some values are only present in one field but not the other, the field will be yellow. In many cases, this means you can combine them, as they reflect the same data. In other cases, it could represent that one field is a superset of another - for example, “Custodian” and “All Custodians” will have a good deal of overlap, but should not necessarily be combined.

For Overlays and Mixed Uploads

The uploader will provide you with the previous mapping of each field. Importantly, mapping a loadfile metadata field to a different metadata field on Everlaw will delete the values in the old field for documents in current upload to migrate the values over. This is because the system does not want to store extraneous data, and believes that if you have remapped it, there is a good reason to have done so. If you do not want the old values to be removed, go to the previous step and re-map the metadata to the same field that it was previously mapped to.

The image below shows a conflict concerning the DOCDATE metadata field. The values under this field were previously mapped to a semantic field called “Doc Date.” The metadata modification changed the field to a canonical date field, DOCDATE. You can continue with the modify upload if you are sure that you would like to remap your data to DOCDATE and remove values previously mapped under Doc Date. You can also go back to the Normalize step and change DOCDATE to Doc Date if you would not like to remove old values.

mceclip4.png

You will only be able to move on to the next step once you have resolved all issues.

 

Return to table of contents

Review

This step gives you a summary of your configuration. This includes any conflicts, new fields that are being added, or semantic (non-canonical) fields being added to the database. If you would like to change any of these fields, go back to the previous steps and adjust them.

Overlays and Mixed Uploads

For any upload that includes an overlay you will see a section that asks “What should we do with existing metadata values?” and provides you with the option to replace existing values or clear all existing values before updating. If you would like to solely modify the metadata fields that are listed in your new loadfile, choose 'Replace existing values.' This will only replace the metadata fields for which there are new values in the current overlay. If you would like to erase all existing document metadata for these documents, select 'Clear all existing values before updating.' Note that selecting to clear all existing values will delete all existing metadata for your documents, not just the fields that you are modifying. Choosing to clear all existing metadata values can have unintended consequences and is usually unnecessary, so it is best to only select this option if you are sure that it is best for your situation.

 mceclip5.png

 

Return to table of contents

Select folders

In this step, you select the folders that you would like to be used to upload each type of document: image, text, and, if a native link is present, native. First, select the directory or directories where the images can be found, then hit “Next: Text.” Do the same for text native. You can expand folders by clicking on them twice.

select.gif

Please note that the uploader will give you an error if you point it to two files with the same name. This could happen if there are native PDFs or other image files in the native directory and you pick a top level directory that includes both the images and the native directory. The solution to this is to make sure that images and native documents are in separate folders and to select only those folders during this step.

 

Return to table of contents

Verify files

In this step, the uploader will first go through all of the files and verify that they are present. If any are missing, it will prompt you to continue anyway or stop the upload. If you continue anyway, you can download a list of missing files at the end. Then, the upload will occur, and you can watch as the system goes through all of the files. As the upload goes on, documents are indexed as they are ingested into the system. That means you will begin to see documents in the database before the upload is complete. While the upload continues, do not navigate away from the Processed Uploads page as that will interrupt the file transfers.

Once the upload has completed, you can open a search of the documents in the results table, download the loadfiles that were used to configure this upload, or delete the upload by clicking on the trashcan. Deleting the upload with remove the documents from the database and all its associated projects, not just delete the uploads table entry. Deleting a mixed upload will only delete the new documents that were created by that upload. Modifications to documents will remain.

mceclip0.png

 

 

Return to table of contents

Associate documents with source card

When users (typically non-Organization Admins) upload zip files of processed documents to the platform via the project upload page, an upload card is created for the processed source.

Screen_Shot_2018-10-12_at_7.39.06_PM.png

Organization admins can associate uploaded documents with their original processed source(s). To do this, choose the processed source from the dropdown under your completed processed upload. The selected processed source(s) will appear under “Sources to Assign.”

associate_with_source.gif

Associating uploads with their processed source will automatically mark the upload complete on the processed source card. Additionally, the card will have a document icon linking users to the uploaded documents.

Screen_Shot_2018-10-12_at_7.44.21_PM.png

Return to table of contents

OPT Files

An OPT file is an explicit listing of the name for each page of a document, including the filename of the image for that page. It is used to determine the names of image files when the name cannot be determined implicitly from the document’s Bates range and page count.

An OPT file is required whenever there is an inconsistency between the number of pages indicated by the Page Count column and the implicit number of pages described by the Bates range. For example, loadfile record ABCD001-ABCD010 has a page count of 12 whereas the count implied by the Bates range is 10. An OPT file will be required to rectify the inconsistency. There are several possible reasons for the apparent or actual inconsistency:

  • The Page Count column is inaccurate (rather than provide an OPT file you could ignore the Page Count column).
  • Some pages have non-obvious names, e.g., the pages go ABCD001, ABCD002, ABCD002.2, ABCD002.3, ABCD003, ABCD004, … ABCD010. In this case the Bates range is correct and the page count of 12 is correct. The OPT file is needed to determine the name for each image. This problem may not be obvious without inspecting the images.

If the Page Count is correct but the Bates range is incorrect, an OPT file won’t fix the problem. You will need to correct the Bates range in the loadfile.

Another case requiring an OPT file is whenever the names of a document’s images cannot be inferred from the Bates range. For example:

  • ABCD001-ABCD004.4 (some pages use a page separator, others don’t)

OPT files are currently not supported for uploads using PDFs as the image format.

For common errors when uploading documents, please see our help documentation.

Return to table of contents

Have more questions? Submit a request

0 Comments

Article is closed for comments.