Source Format Library and Creating a New Custom Source Format

A source format is a collection of standardized settings and criteria for a dataset uploaded frequently from the same source. The source format defines the destination fields and values in Slate and allows users to prescribe consistent import behavior such as new record creation, updates to existing applicant records, and de-duplication within a source file.  Importing data through a source format can be automated by configuring standard Slate SFTP options.

Slate's Source Format Library, accessed from the Database menu, offers numerous predefined formats for data sources common to many partner institutions.  See Source Format Descriptions for a list and descriptions of the source formats in the library.  Most existing source formats require little or no customization.  Source formats that import test score data require the corresponding test objects to be added from the Slate Template Library. 

Use existing source formats whenever possible for importing data. If a standard source format does not exist, then a custom source format must be created.

Important

Exercise extreme caution when uploading datasets of any kind, and in particular, datasets that come in a predefined format such as score report data files. Before uploading, always check to confirm that the dataset is consistent with the format type you have selected and with past datasets of this type. Slate is a transactional database, so data is constantly changing on a moment-by-moment basis.  As with any enterprise database, there is no way to undo the effects the uploaded dataset has already had on records in Slate as part of the import and merge process.

  Best Practice

Test data imports in the test environment before importing data in the production environment. Set up the import in the production environment, refresh your test environment, and then activate the import in the test environment first, to make sure the results are as expected. Once the desired results are achieved in the test environment, then run the import in the production environment.

Creating a Source Format

Source formats can be generated using settings from a previous upload, or they can be created from scratch.

Generating a Source Format from a Previous Upload

  1. Select Database on the Slate navigation bar and select Sources / Upload Dataset from the Import and Data Processing section.
  2. Select a previous ad hoc import to use as the foundation for the new source format.
  3. Select Generate Source Format.

    Select_Generate_Source_Format
  4. Follow the instructions below for configuring the Source Format.

Creating a New Source Format

  1. Select Database on the Slate navigation bar and select Source Formats from the Import section. The Source Formats page appears.
  2. Click New Source Format. The New Source Format popup appears.
  3. On the General Tab, fill in the information about your Source Format.  Each of the fields is explained below:
    • Status - Select Active to make the Source Format available for use in Upload Dataset.
    • Name - This is what you would like to call the Source Format
    • Format - This is where you are able specify which format the file will be: For example, Excel, CSV, Fixed Width, etc.  This setting is used solely for display purposes.  The actual formatting of the file is defined in the XML setting on the Format Definition tab.
    • Remap As Of Date - This is where you tell the Source Format how far back to look at data values to make them available in Remap.  For new Source Formats, you can simply put the current date.  A date is required once you are ready to Activate the Source Format. This setting is helpful if a file was ever inadvertently imported without headers. If this should happen, the file's first row, which likely includes data to be mapped, becomes the "headers." To fix this, simply adjust the remap as of date to a date after the errant file was imported. 
    • Remap Active -  When this is updated to Active, Slate will automatically process the uploaded file using the Remap settings, which you will define in a later step.  Leave this as Inactive until you are satisfied with your remap settings.  Prior to activating, the Remap As of Date must be populated or Slate can not reference the effective date.
    • Type - One-Time / Differential will retain existing source files alongside new source.  Cumulative / Replaceable will replace any existing source files with a new source file each time.  A common use for the Cumulative / Replaceable type the standard Checklist Import (Financial Aid) since every record is included every time.  Most imports will use the One-Time / Differential type.
    • Scope - Select Person/Dataset Records unless importing Slate users.
    • Dataset - This specifies if the dataset will be importing into Persons/Applications, or if it will go into any other datasets (such as Organizations, or Volunteers).
    • Unsafe - If a Source Format is Unsafe, it will be allowed to update person-scoped data on student records with an active application. A Source Format should be marked as Unsafe only if the source is trusted to contain accurate, and up-to-date information. You do not want an import to update information that the student recently provided you on an application with information that may be outdated or incorrect.

    An import into Slate from your student information system is an example of a Source Format that you might want to be Unsafe.

    Purchased lists of student information are examples of imports that should be Safe Source Formats.

    • Hide - The recommended setting is to Create source interactions unless the Source Format will be used for a daily feed from an SIS or other application.  Daily or frequent data updates/imports will contribute to the creation of many rows of data interactions on records.  Aside from the exceptions of high frequency imports, it is preferable to create source interactions to display activity history on a record's timeline.
    • Disable Update Queue - By default, records that are imported or updated from this Source Format will enter an update queue. This queue is used to run deferred rules and also adds records to queues for those data exports that are configured to only include records updated since the query last ran.

      If the Source Format is a backfeed that sets irrelevant information in the context of query update queues and rules, then this setting should be set to Prevent records from entering update queue upon import (disables rules from firing). Using this setting will prevent a feedback loop, where each updated record is exported from Slate into an external system, which then exports data for each of those updated records back to Slate. If the update queue were enabled, those records in that import would be included in the export again the next time it ran, even if the import was the only update on those records.
    • Update Only - This tells Slate how to handle records in the incoming dataset that do not match records within Slate.
      • Allow record creation means that Slate will create a new record for any records in the incoming dataset that do not match records in Slate.
      • Update only means Slate will not create new records, and the dataset will only update records on which it matches.  Note that when Update Only is selected for an application data import, an application will not be created even if the person record already exists.
    • Dedupe Records - By default, Slate will create a new record for every row when Allow record creation is selected and an existing match cannot be found.  By selecting Dedupe records, Slate will evaluate the source file and dedupe records in the source based on an exact match of First Name, Last Name, Email, and Birthdate.
    • Notification and Notification Email - To receive email notification if an imported file associated with the Source Format fails, select Failures only and enter the email address(es) for notification. Select Successes and Failures to be notified of both successful and failed imported files for the Source Format.
    • Read Permission and Upload/Build Permission - Permission to the Source Format can be limited to specific permissions. 
  4. The settings on the Format Definition provide Slate with specific instructions for interpreting the incoming data.
    • XML - Use XPath syntax to tell Slate how to parse the data source. For example, you can define whether the file is delimited, if there is a header row, and what file format is used. The Common XML Examples section of this article has specific examples that can be copied and pasted into this field.
    • SQL - For most data imports, the SQL text area should be left blank. Please submit a service request if you feel that a Source Format will require custom SQL.
    • PGP Private Key - Paste the private key from the PGP software.
    • PGP Password Each PGP private key is typically encrypted with a password.
  5. The settings on the Import Automation Tab are optional and will not apply to every Source Format.
    • Import Path/Mask - This is the field that tells Slate that you want to pull in files automatically. In this field, you should include the file name pattern that Slate should look for to know that the file should be imported using this Source Format. We recommend appending the file name with a date so you are not using the exact same name each time. Slate does keep the files for a period of time in a Processed directory within the /incoming folder - if the file names are exactly the same, each subsequent file load will delete the file from the Processed directory.

    An example Import from FTP Mask field would be: SIS_to_Slate_*.txt.

    Slate would automatically import any file in the /incoming directory that contained that file name pattern, including: SIS_to_Slate_09032013.txt.

          • Import Remote Server - Leave this field blank unless the data for this Source Format will be delivered via an API call from Slate or an external SFTP server. 

            - For API calls, include the full URL, including https:// and any necessary querystring parameters.
            - For remote SFTP servers, the syntax (including support for Remote Service Account users) is exactly the same as when configuring a Remote Server for data exports.
          • HTTP Headers - Add a list of header in the form, for example: <t name="HEADER">VALUE</t>
          • Import Order - If you specify an Import Path/Mask, you may optionally specify an import order if the order that Slate processes the file matters. For example, if you have an application file and a supplement file that will be delivered to the SFTP directory together, and you would want the application file to be processed prior to the supplement, you can enter 1 for the application's Source Format and 2 for the supplement's Source Format.
          • Import Frequency
            SFTP
              By default, Slate sweeps the SFTP directories for new files that match the specified Import Path/Mask every 15 minutes. If you will only be dropping off files once per day, you can choose to increase the frequency by specifying the number of minutes between file sweeps - e.g.,1440 minutes.

            Web Service
              Slate's remote import automation will by default poll the end point every 24 hours. The import frequency can be adjusted to support pulling data as frequently as once an hour by setting the frequency value to 60 (minutes) or greater. Any value less than 60 will poll the endpoint hourly; if no value has been entered, Slate will continue with the default setting of polling the endpoint every 24 hours.


    The Import Frequency setting is limited to only those Source Formats picking up from remote SFTP servers and web services—not the Technolutions SFTP. 

      1. Select Save when you are satisfied with the Source Format settings.

Format Definition XML

The "XML" setting on the Format Definition tab tells Slate how it should parse the incoming data. Each setting is listed here, followed by examples for common source data formats.

  Tip

We recommend finding one of our example format definitions for flat files or web services, copying the XML, and modifying as needed.

Format definition XML always begins with a <layout> node. This node can have several attributes:

  • The b attribute indicates the separator. For example, use b="," for a comma-delimited file.
  • The attribute indicates whether the data contains a header row. Use h="1" if your data has a header row. If there's no header row, this attribute can be set to 0 or simply excluded.
  • The attribute indicates the text qualifiers. For example, use t="&quot;" for files where each data element is contained in double quotes.
  • The type attribute is used when the incoming data is not a text file, and can be set to one of the following values:
    • type="convert" is used for Excel files.
    • type="xml" and type="json" are used for XML and JSON files, respectively. 
  • The width attribute is used in fixed-width files to define how many characters each row of the file has.

When the incoming data does not have a header row, <f> nodes contained within the <layout> node are required. Each <f> node represents a single field to be mapped in the source format, and can have the following attributes:

  • Required: The id attribute is what the ID of the field should be. It should be unique among the other fields in the Source Format. This is also what you will see as the "Source" field when you are mapping the values on the Remap page. Note: this ID should be no more than 64 characters in length.
  • Required: The s attribute defines where to find the field in the source data. The examples below demonstrate how to use this attribute for various format types. 
  • The attribute is used in fixed-width files to define the character length of the field.

Flat File Examples

The following format definintion examples are for flat files, like CSV, Excel, and fixed-width files.

CSV file with a header row and double quotes around each element:

<layout b="," h="1" t="&quot;" />

This format definition works if the source data looks like this:

"Joe","Brown","1745 E. New St., Apt. 22","Chicago","Illinois","46702"

Double-quotes are wrapped around each field, while a comma separates each field. By including the quotes, Slate interprets "1745 E. New St., Apt. 22" as one field rather than two.  

Tab-separated file with a header row and no text qualifiers:
<layout b="&#x9;" h="1" />
Pipe-separated file with a header row and no text qualifiers:
<layout b="|" h="1" />
Excel file with a header row:
<layout type="convert" h="1" />
CSV file without a header row:

<layout b="," h="0" t="&quot;">
  <f s="1" id="First Field Name" />
  <f s="2" id="Second Field Name" /> 
</layout>

The h="0" indicates that the file does not contain a header row. Since files without a header row need to have the fields defined, <f> nodes are used with the following attributes:

  • The attribute defines which column of data corresponds to each field. For example, s="1" is the first field in the file, up until the first delimiter (in this case, a comma).
  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.
A fixed-width file with no delimiters, where the beginning and end of each field is defined based on the specific location in the row:

<layout width="43">
    <f s="1" w="25" id="Student Last Name" />
    <f s="26" w="16" id="Student First Name" />
    <f s="42" w="1" id="Student Middle Initial" />
</layout>

The <layout> node's width attribute defines how many characters each row of the file has. In the above example, each row would be 43 characters long.

Fixed-width files require <f> nodes so that Slate can determine where each field begins and ends, using the following attributes:

  • The s attribute defines which character the field starts at. For example, s="1" means that this field starts with the first character in the file.
  • The w attribute defines the width of the field. For example, w="25" means that the field is 25 characters long.
  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

Web Services Examples

The following examples are for XML and JSON formats, which are commonly imported via web services.

XML:

<layout type="xml" node="/persons/person">
  <f s="first" id="First Name" />
  <f s="last" id="Last Name" />
  <f s="email" id="Email" />
  <f s="birthdate" id="Birthdate" />
  <f s="schools/school[1]/ceeb" id="School 1 Code"/>
  <f s="schools/school[2]/ceeb" id="School 2 Code" />
</layout>

This format definition works if your incoming data is structured like this:

<persons>
<person>
<first>Joe</first>
<last>Brown</last>
<email>joebrown@example.com</email>
<birthdate>2000-01-01</birthdate>
<schools>
<school>
<ceeb>380880</ceeb>
</school>
...
</schools>
</person>
...
</persons>

In the <layout> node, the type attribute tells Slate which file type to expect - in this case, XML. The node attribute tells Slate the path to each "row" of the file; in this case, there is a <person> node for each row contained within an overall <persons> node.

Each <f> node in the format definition represents a field you will map.

  • The attribute uses XPath syntax to define where the data exists in the source. For example, s="email" means Slate will use the <email> node in the source data.
  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

The last two <f> nodes demonstrate how to handle multirelational data - in this case, Schools:

  • The School 1 Code has a path of schools/school[1]/ceeb. Unlike the previous paths, this path contains square brackets with a number, [1], which indicates that we should pull the <ceeb> node from the first <school> node that we encounter under the <schools> node.
  • We get to the second <school> code by incrementing the number contained in the square brackets. You will need to anticipate how many nodes might appear and map them all separately, incrementing the bracketed number each time.
JSON:

<layout type="json" node="/students">
  <f s="first" id="First Name" />
  <f s="last" id="Last Name" />
  <f s="email" id="Email" />
  <f s="birthdate" id="Birthdate" />
  <f s="schools[1]/ceeb" id="School 1 Code"/>
  <f s="schools[2]/ceeb" id="School 2 Code" />
</layout>

This format definition works if your incoming data is structured like this:

{
students: [
{
first: "Joe",
last: "Brown",
email: "joebrown@example.com",
birthdate: "2000-01-01",
schools: [
{
ceeb: "380880"
},
...
]
},
...
]
}

Important

A named root node is required for JSON formats. This is because JSON is internally converted to XML during processing. JSON data that begins with an array (i.e. begins with a square bracket) cannot be imported.

In this format, the type attribute is set to JSON, and the node attribute tells Slate the name of the root node. If your data begins on a lower level of nesting, you can use slashes to indicate the path; for example: students/student.

Each <f> node in the format definition represents a field you will map.

  • The attribute uses XPath syntax to define where the data exists once the JSON has been converted to XML. For example, s="email" means Slate will use the email property in the source data.
  • The id attribute is the source field name that's shown in the Field Mappings stage of Upload Dataset. It should be a user-friendly name that's unique and no longer than 64 characters.

The last two <f> nodes demonstrate how to handle JSON arrays:

  • The School 1 Code has a path of schools[1]/ceeb. Unlike the previous paths, this path contains square brackets with a number, [1], which indicates that we should pull the ceeb property from the first object in the schools array.
  • We get to the second CEEB code by incrementing the number contained in the square brackets. You will need to anticipate how many items might be in the array and map them all separately, incrementing the bracketed number each time.

Upload Dataset

Now you will be able to upload a sample file using the Source Format that you created so you can define the Remap settings.

  1. Select Database on the top navigation bar and select Upload Dataset.
  2. Select the Source Format that you created from the File Format list.  
    • Source Formats will appear within the list of Predefined Formats
  3. Add a sample file, then click Upload.
    • The sample file should exactly match the format that you intend to send Slate.
    • Slate will process the file at this time.  When it has completed, the status on the Upload Dataset main page will change from Pending to Completed or Awaiting Activation.
  4. When the file has uploaded, clicking on the name of the Source Format will link to the Remap stage where source fields can be mapped to destination fields within Slate.
  5. When you are satisfied with how the file is mapped, go back to the new Source Format in the Admin Tool and set the Remap Active setting to Active.  Slate will then process any files that were uploaded on or after the Remap Effective date using the remap settings that you defined.  Any additional files that are uploaded to Slate using the new Source Format will also use the Source Format and remap settings that you configured using these steps.

Important

If you receive the status/error of "Awaiting import through Source Format remap", then you will want to be sure that your source format has the appropriate XML setting on the Format Definition tab. You can use one of the examples above to build out this XML.
Was this article helpful?
11 out of 22 found this helpful