Predictions

Predictions is integrated predictive modeling that helps forecast probabilities. 

Predictions leverages the Query Builder to define a historical set of data, the desired measurable outcome, and the target set of data to which the prediction is applied.

For example, a prediction could be created for the likelihood of a person becoming an applicant, based on their test scores, location, and number of emails opened. Different predictions could be modeled for other points along the admissions funnel.

The historical set would identify records that have been previously engaged with but are not currently being actively worked on. The measured outcome would be the existence of an application record.

The system will then construct a series of mathematical models based on this data and apply these to the target set in order to make an outcome prediction for each member.

The outcome predictions are stored and can be accessed individually or in batch as part of queries, exports, or rules. 

For each individual prediction, a historical set (including predictive points), target set, and outcome can be created or updated. The results of a prediction (i.e., the generated models) are displayed as a set of color charts, with the corresponding predictive power of each model displayed in percentages.

Predictions uses machine learning and employs the following algorithms:

  • Decision Tree
  • Naive Bayes
  • k-Nearest Neighbour (KNN)
  • Logistic Regression
  • Linear Regression
  • Neural Network
  • Perceptron

  At what point is it best to start using Predictions?

Predictive modeling is more accurate when there is a sizable amount of historical data. If an institution is new to Slate, there is no data to use for the historical population. It is recommended that organizations should have at least one year of historical data to start using Predict.

The data available as well as its relevance and predictive power are different for each institution. In that regard, there is not one correct model nor one best way to create a model. 

Predictions is used for creating predictive models using Slate data. Predictive modeling is a science unto itself, and developing models is a nuanced task that also requires an understanding of your institutional data and trends.

There are many resources available for getting started on the basics of predictive modeling and machine learning. For example, there are a number of free courses and tutorials accessible online such as with Udacity, Coursera, or Udemy, that cover topics such as machine learning and predictive modeling and analytics. We also recommend working closely with other departments at your institution, such as the Office of Institutional Research, to determine what data points could be tested as predictive.

Finally, your peer institutions may want to share how they make use of Predict and the data it returns. The Slate Community forum is a great place to reach out to other peer institutions directly. 

Create a Prediction
  1. Click Queries / Reports in the top navigation bar.
  2. From the navigation list on the right hand side of the page, click Predictions.
  3. Click New Prediction. Give the prediction a name.
  4. Select a Population on which the prediction should be based.
Configuring the Prediction: The Historical Set

The historical set carries out two functions. The first is to identify, by way of filters, the older (historical) records around which the model should be based. For example: a filter for age, record creation, or one/multiple intended past entry terms.

The second function is to define, by way of export parts, the data points that are potentially predictive. It is important to note that each export should return a limited number of possible values to facilitate the grouping of data, identification of patterns, and building of models. For example, numerical values such as High School GPA could be distributed into a number of bands/ranges:

  • 0.00 - 1.00
  • 1.01 - 2.00
  • 2.01 - 3.00
  • 3.01 - 4.00
Exports & Filters Description 

Historical Set: Exports

 

It is recommended to include about three to five different exports as the predictive data points. Fewer than three exports creates models that are too "simple" (i.e., the models are not able to learn much from the historical set). Too many exports, on the other hand, potentially introduces too much "noise" into the historical set, leading to models that are too "tailored" to the noise, anomalies, and randomness in the particular historical set, instead of allowing for more generalities across the population to be taken into account.

  • Existence export parts - can be used to define a number of binary (e.g., Yes/No) values across each record.

    An existence export can be added that returns "Y" if the record meets the display filters criteria, and "N" if it does not. In this example, "Event by Category" is added under display filters, which will return "Y" if the record has attended an event in the specified event folders.
  • Numerical export parts - can be configured with a format type of Distribution to create ranges. 

    In this example, instead of the algorithms trying to generate predictive models based on every possible output for "Converted 4.0 HS GPA," percentile ranges can be created using the "Distribution" format type. An interval of 4 arranges the GPA values into 25th, 50th, 75th, and 100th percentiles. 

Be wary of including predictive data points that are "proxies" for the outcome. For example, for an outcome of likelihood to apply, a "Decision" export would be overwhelmingly predictive, as a person would have had to become an applicant before any decisions are made.   

Historical Set: Filters

Filters should select a population that is indicative of a past cohort, and matches the features of but is separate from the population of the target set.

For example, if the target set selects persons for whom you are interested in predicting the likelihood of creating an application, the historic set should identify persons who have previously had an opportunity to apply (regardless of whether or not they did).

Configuring the Prediction: The Target Set

The target set is the population of records for which the tool will attempt to predict an outcome. The filters used for this set should be similar to those used in the historical set; except, any "temporal" filters (such as application period, or entry term) should filter on the target or "current" records, as opposed to the historical set which is using previous (older) records.

The records in the target set should be mutually exclusive from those in the historical set in order to prevent skewing the accuracy of the results.

To edit the target set, click Edit Target Set from the prediction page.

The query filters interface is used to define a target set. There is no need to select exports, as the predictive variables have been defined from the exports selected with the historical set.

Configuring the Prediction: Outcome

The outcome is the end result for the prediction.

  1. To define the outcome, click Edit Outcome from the prediction page.
  2. Add an Outcome Label.

The query filters interface is used to define the outcome. For example, for a "Will Apply" prediction outcome, filtering by "Has Application" is appropriate.

The filter(s) selected here are effectively applied to both the historical set and target set as an existence filter. That is, there are three outcome groups:

  1. Historical set for which the outcome filters evaluate to True.
  2. Historical set for which the outcome filters evaluate to False.
  3. Target set for which the outcome is unknown and to be determined.
The Results

Important!

Clicking Generate Models queues the model to generate during a deferred process. Results will be available for viewing by the following day.

Predictions constructs the predictive models using the different machine learning algorithms listed earlier. The models are constructed based on 80% of the historical data defined. Then, the predictive ability of those models are tested on the remaining 20% of the data to produce an approximate accuracy of the model. 

  • Percentage: The percentage that is displayed per model reflects the accuracy of the model as compared to the actual outcome of the 20% test population. The best models will have a high percentage. However, a percentage that is very high, such as 99 or 100%, likely indicates that the historical set or chosen predictive data points have created a biased prediction. 
  • Band/Chart: A visualization of the model applied to the target set, where each color band represents the proportion of the target set with a given predicted outcome. 
Using Predictions with the Query Tool

Standard filters and exports for Predict data are available from the Slate Template Library.

The Prediction Outcome (Application or Person scoped) configurable exports include:

  • Prediction Outcome - XML data about the outcome (applicable to certain models only).
  • Prediction Timestamp - the timestamp for when the prediction was generated for the record.
  • Prediction Probability - specific record's likelihood of meeting the desired outcome.
  • Prediction Model Accuracy - see "Percentage" in the previous section.
  • Prediction Model - the name of the model selected in the export configuration.

The Prediction Outcome By Model (Application or Person scoped) filters select records for which an outcome was predicted or not predicted, based on a prediction model

Was this article helpful?
5 out of 9 found this helpful