Importing Documents Using a Document Import Processor Source Format
  • 12 Apr 2024
  • 4 minute read
  • Dark
    Light
  • PDF

Importing Documents Using a Document Import Processor Source Format

  • Dark
    Light
  • PDF

Article Summary

Slate supports the batch importing of documents using the industry-standard Document Import Processor (DIP) format.

Each batch should be structured as a compressed zip archive containing PDF documents and a single index file. A sample DIP file is attached to this article.

To import documents, create a new Source Format. Any number of DIP Source Formats can be created. If you are using the Import Path/Mask setting (described below), ensure that the masks do not overlap (just as with text-based imports).

Configure the Source Format

XML

Once a new custom Source Format has been created, use the following guidelines to define the XML configuration on the Format Definition tab of the Source Format:

  1. The XML configuration must specify the type as dip.

  2. The index file format must be specified.

  3. If the index file is not named index.txt, specify the index file's name. The file name for the index file supports wildcards.

    For example, the XML configuration for a DIP Source Format where the index file is tab-delimited, has quotes as text qualifiers, has a header row, and the name of the index file will start with "index_" and end with ".txt::

    <layout b="&#x9;" h="1" t="&quot;" type="dip" index="index_*.txt" />

Import Path/Mask

DIP Source Formats allow users to upload a zip file directly to Upload Dataset. The source format can be configured to automatically pick up zip files from an SFTP directory using the Import Path/Mask setting on the Import Automation tab. For example, dip/*.zip automatically picks up any zip file from the /incoming/dip/ directory. Using documents_*.zip picks up any zip file in the /incoming/ directory where the filename begins with "documents_," such as "documents_07082016T083211.zip."  

Remap Active and Remap as of Date

The Remap settings for DIP Source Formats function the same as any other import. When creating the Source Format, leave the Remap Active setting configured as Inactive. You will change this later once you have finished mapping the index file. To enable mapping, enter the current date in the Remap As Of Date field.

Index File

The index file must contain a row for each document in the zip archive. It should contain a column that specifies the file names of the documents within the zip archive. If the Source Format is used to import multiple document types, there should also be a column that specifies the type of document for each row in the file.

Any other data can also be included and mapped in the index file, but at a minimum, it should include data that will allow Slate to match to existing records (or create new records, if that is desired).

If Slate can match an existing application or create a new application from this import, then the imported documents will be associated with that application. Otherwise, they will be imported to the Folio.

Map the Index File

The Remap settings function functions the same as with any other import. The only additional requirements for the DIP Source Format type are:

  1. The containing the file name must be mapped to Material - Material Filename.

  2. The column containing the material type (or a static value if the imported documents are all the same material type) must be mapped to Material - Material Code.

  3. The Material Codes should be mapped to the associated material keys in the Value Mappings.


    After the index file is mapped, set the Remap Active setting to Active to have Slate automatically process files uploaded on or after the Remap As Of Date value of the Source Format.

NOTE: If the material does not exist, it must be created. Custom materials can be created following the steps found under Creating a New Material in the Materials article.

 

Scopes

DIP-supported scopes:

  • School

  • Test

  • Gift

  • Opportunity

  • Dataset

  • Person

  • Application (if the import is associated with an application, it goes to the application, otherwise it is saved to the folio)

Special Materials

The DIP Source Format type supports importing school-scoped and test-scoped materials.

  • School-Scoped Materials: To import school-scoped materials (including school-specific transcripts), include (and map) school data (including School Key and School Name) in the index file that will either create a school record or match onto an existing school record. That school record will then be used as the record for the school-scoped material.

  • Test-Scoped Materials: To import test-scoped materials, include test score data in the index file that will either create a test score record or update an existing test score record. That test score record will then be used as the record for the test-scoped material.

    The article Matching Criteria for System Objects will be a helpful reference for matching on Schools or Tests.

    If school or test score data was not included in the index file, or not enough data was included in the index file to match onto an existing school/test score record or create a new school/test score record, the material will appear in Batch Acquire to allow manual assignment.

Unmatched Materials

Batch Acquire displays any materials included in the zip archive that could not be associated with a record to be manually assigned.

Materials may appear here for reasons including:

  • The Source Format Remap Active setting has not been set to Active. (If a file is imported while the Remap Active is set to Inactive, materials will appear in Batch Acquire. Once the setting is set to Active, all associated sources awaiting import will be processed. Documents will be imported to the appropriate person records and will no longer appear in Batch Acquire.)

  • A material type was not mapped.

  • The file name was omitted from the index file.

  • The document's filename does not match the filename in the index file. Ensure the file extension is included (i.e. '.pdf').

  • There was not enough data to match to a record or create a new record.

  • The material was mapped to a school/test-scoped material, but school/test score data was not included in the index file, or not enough data was included to match to a school/test score record or create a new school/test score record.


Was this article helpful?