rAI-powered Predictive Modeling - Data Requirements – Resonate Analytics Knowledge Base

There are two datasets required, the raw CRM data and a manifest. These datasets can be transferred to us in file format via AWS S3, Box.com or SFTP or in tables via Snowflake. The associated knowledge base articles are linked below:

CRM Data Requirements

Basics

Format: CSV file format with a properly formatted header as the first row or Snowflake table
- The CRM data can be partitioned into multiple CSV files but must begin with the same header and adhere to all other formatting requirements.
- For Snowflake, only one table for the CRM data is supported.
Identifier Column: The ID column must be a unique hashed email identifier field.
Rows: Each customer record must occupy exactly one row.
Null Values: Acceptable in every column except the Email/ID column, ID Type column, and the identifier column (if provided).

Supported Data Types

Strings
- Can be categorical or ordinal data
Dates
- Expressed in ISO 8601 format (yyyy-MM-ddTHH:mm:ss.SSSZ) or in RFC 3339 format (yyyy-MM-dd HH:mm:ss.SSSZ).
Booleans
- Supported values: yes, no, y, n, 1, 0, true, false
Integers
Floats/Doubles

Raw Data File Formatting

Option 1 (recommended): File Formatting for Unhashed Emails:

Follow these formatting instructions if you have regular email addresses that you need to hash. Once formatted, you will use Resonate's hashing script to hash your emails. [Hashing Script Instructions]

1. Two column headers are needed within the file even if there is no data for a particular column.

Column 1 Header: ID Type
Column 2 Header: Email

2. After those two columns, the rest should contain the CRM data (if applicable)

Columns 3 to N: CRM data

3. Checklist to verify elements before uploading to Resonate

File contains the 2 required column headers
- ID Type
- Email
File is in .csv format
ID Type column is HEMMD5, HEMSHA1, or HEMSHA2
A customer record occupies only one row i.e. there should be no duplicate rows with the same email
No null values for ID Type, Email, or for the identifier column
File contains the minimum number of records - 100,000 entries

4. Once your file is in the correct format, please save with the following name:

Name file(s) as follows: <client name>_<model_type>_<date>
Example: Resonate_Churned_20220419.csv

5. Follow one of the instructions in the link to hash your emails using Resonate's hashing script: Hashing Instructions for Mac, Hashing Instructions for PC

Option 2: File Formatting for Existing Hashed Emails:

Follow these formatting instructions if you have one of the following IDs: MD5, SHA-1/SHA-128, or SHA-2/SHA-256.

1. Two column headers are needed within the file even if there is no data for a particular column.

Column 1 Header: ID Type
Column 2 Header: ID

2. After those two columns, the rest should contain the CRM data (if applicable)

Columns 3 to N: CRM data

3. Use this checklist to verify elements before uploading to Resonate.

File contains the 2 required column headers
- ID Type
- Email
File is in .csv format
ID Type column is HEMMD5, HEMSHA1, or HEMSHA2
Verify that the hashed email is in the proper format denoted by the ID Type
A customer record occupies only one row i.e. there should be no duplicate rows with the same email
No null values for ID Type, Email, or for the identifier column
File contains the minimum number of records - 100,000 entries

4. Once your file is in the correct format, please save with the following name:

Name file(s) as follows: <client name>_<model_type>_<date>
Example: Resonate_Churned_20220419.csv

Manifest Data Requirements

The manifest file or table describes/defines the CRM data included in your data file. Every column excluding the ID Type and ID columns needs to be defined in this file. This file is necessary for proper modeling as it describes how to utilize the CRM data properly.

Schema

column_name: Exact name from the header in the data file.
dtype: Data type. Supported values are:
- string/varchar - A standard word
- Int/integer - A number without decimals
- timestamp – an ISO 8601 formatted field
- double/float - A decimal value, use double in cases where there is high precision needed (more than 2 decimal points).
variable_type: Data properties. Supported values are:
- boolean – data type represents a true or false value
- categorical – represents a set of possible strings
- ordinal - Represents a set of possible numbers
- numeric – A numeric value
- date/datetime – represents a timestamp
usage: The purpose of the field
- identifier – A column that identifies the data (account ID, user ID, etc. for example)
- feature – Used to train the model
- label – The column to predict on. For churn, this would be the column specifying whether someone has churned in the past or not. For next best, this would signify your best customers amongst the dataset.
- passthrough – Not used to train the model
allowable_values: Possible values, including nulls. Below are the rules:
- integer, float, or double - provide the list of possible values. If the values are within in a range, provide the range with first value, second value. Include null if values can be null. Infinity if it can go to infinity.
  - Ex 1: 1, infinity → possible values between 1 and infinity
  - Ex 2: -10, 10 → possible values between -10 and 10
  - Ex 3: 4, 5, 6, 7 → possible values are these 4 digits
  - Ex 4: 1, infinity, null → possible values between 1 and infinity and nulls
- timestamp - leave blank
- string - leave blank
- boolean - list what is possible for a boolean value (1, 0, y, n, etc.)
column_description: Plain text description of what the column is.

Manifest File Format

Below shows an example of a manifest matching the sample raw file above:

Screenshot 2025-01-09 at 9.18.57 AM.png

Naming

Please name your file manifest.csv and upload to S3 in standard CSV format at the same path as the raw data files.

Articles in this section

rAI-powered Predictive Modeling - Data Requirements

CRM Data Requirements

Supported Data Types

Raw Data File Formatting

Option 1 (recommended): File Formatting for Unhashed Emails:

Option 2: File Formatting for Existing Hashed Emails:

Manifest Data Requirements

Comments

Articles in this section

CRM Data Requirements

Supported Data Types

Raw Data File Formatting

Option 1 (recommended): File Formatting for Unhashed Emails:

Option 2: File Formatting for Existing Hashed Emails:

Manifest Data Requirements

Related articles