There are two datasets required, the raw CRM data and a manifest. These datasets can be transferred to us in file format via AWS S3, Box.com or SFTP or in tables via Snowflake. The associated knowledge base articles are linked below:
CRM Data Requirements
Basics
-
Format: CSV file format with a properly formatted header as the first row or Snowflake table
- The CRM data can be partitioned into multiple CSV files but must begin with the same header and adhere to all other formatting requirements.
- For Snowflake, only one table for the CRM data is supported.
- Identifier Column: The ID column must be a unique hashed email identifier field.
- Rows: Each customer record must occupy exactly one row.
- Null Values: Acceptable in every column except the Email/ID column, ID Type column, and the identifier column (if provided).
Supported Data Types
-
Strings
- Can be categorical or ordinal data
-
Dates
- Expressed in ISO 8601 format (yyyy-MM-ddTHH:mm:ss.SSSZ) or in RFC 3339 format (yyyy-MM-dd HH:mm:ss.SSSZ).
-
Booleans
- Supported values: yes, no, y, n, 1, 0, true, false
- Integers
- Floats/Doubles
Raw Data File Formatting
Option 1 (recommended): File Formatting for Unhashed Emails:
Follow these formatting instructions if you have regular email addresses that you need to hash. Once formatted, you will use Resonate's hashing script to hash your emails. [Hashing Script Instructions]
1. Two column headers are needed within the file even if there is no data for a particular column.
- Column 1 Header: ID Type
- Column 2 Header: Email
2. After those two columns, the rest should contain the CRM data (if applicable)
- Columns 3 to N: CRM data
3. Checklist to verify elements before uploading to Resonate
- File contains the 2 required column headers
- ID Type
- File is in .csv format
- ID Type column is HEMMD5, HEMSHA1, or HEMSHA2
- A customer record occupies only one row i.e. there should be no duplicate rows with the same email
- No null values for ID Type, Email, or for the identifier column
- File contains the minimum number of records - 100,000 entries
4. Once your file is in the correct format, please save with the following name:
- Name file(s) as follows: <client name>_<model_type>_<date>
- Example: Resonate_Churned_20220419.csv
5. Follow one of the instructions in the link to hash your emails using Resonate's hashing script: Hashing Instructions for Mac, Hashing Instructions for PC
Option 2: File Formatting for Existing Hashed Emails:
Follow these formatting instructions if you have one of the following IDs: MD5, SHA-1/SHA-128, or SHA-2/SHA-256.
1. Two column headers are needed within the file even if there is no data for a particular column.
- Column 1 Header: ID Type
- Column 2 Header: ID
2. After those two columns, the rest should contain the CRM data (if applicable)
- Columns 3 to N: CRM data
3. Use this checklist to verify elements before uploading to Resonate.
- File contains the 2 required column headers
- ID Type
- File is in .csv format
- ID Type column is HEMMD5, HEMSHA1, or HEMSHA2
- Verify that the hashed email is in the proper format denoted by the ID Type
- A customer record occupies only one row i.e. there should be no duplicate rows with the same email
- No null values for ID Type, Email, or for the identifier column
- File contains the minimum number of records - 100,000 entries
4. Once your file is in the correct format, please save with the following name:
- Name file(s) as follows: <client name>_<model_type>_<date>
- Example: Resonate_Churned_20220419.csv
Manifest Data Requirements
The manifest file or table describes/defines the CRM data included in your data file. Every column excluding the ID Type and ID columns needs to be defined in this file. This file is necessary for proper modeling as it describes how to utilize the CRM data properly.
Schema
- column_name: Exact name from the header in the data file.
-
dtype: Data type. Supported values are:
- string/varchar - A standard word
- Int/integer - A number without decimals
- timestamp – an ISO 8601 formatted field
- double/float - A decimal value, use double in cases where there is high precision needed (more than 2 decimal points).
-
variable_type: Data properties. Supported values are:
- boolean – data type represents a true or false value
- categorical – represents a set of possible strings
- ordinal - Represents a set of possible numbers
- numeric – A numeric value
- date/datetime – represents a timestamp
-
usage: The purpose of the field
- identifier – A column that identifies the data (account ID, user ID, etc. for example)
- feature – Used to train the model
- label – The column to predict on. For churn, this would be the column specifying whether someone has churned in the past or not. For next best, this would signify your best customers amongst the dataset.
- passthrough – Not used to train the model
-
allowable_values: Possible values, including nulls. Below are the rules:
- integer, float, or double - provide the list of possible values. If the values are within in a range, provide the range with first value, second value. Include null if values can be null. Infinity if it can go to infinity.
- Ex 1: 1, infinity → possible values between 1 and infinity
- Ex 2: -10, 10 → possible values between -10 and 10
- Ex 3: 4, 5, 6, 7 → possible values are these 4 digits
- Ex 4: 1, infinity, null → possible values between 1 and infinity and nulls
- timestamp - leave blank
- string - leave blank
- boolean - list what is possible for a boolean value (1, 0, y, n, etc.)
- integer, float, or double - provide the list of possible values. If the values are within in a range, provide the range with first value, second value. Include null if values can be null. Infinity if it can go to infinity.
- column_description: Plain text description of what the column is.
Manifest File Format
Below shows an example of a manifest matching the sample raw file above:
Naming
Please name your file manifest.csv and upload to S3 in standard CSV format at the same path as the raw data files.
Comments
0 comments
Please sign in to leave a comment.