Hashed Emails (HEMs)
HEMs are recommended because emails are a stable ID that represent a known customer that is unique to an individual. When sending HEMs, we recommend MD5 due to the persistence of the IDs as the longest and most continuously available ID.
HEM is short for Hashed Email. Unlike a third-party cookie, the email address is a stable ID that represents a known customer. It is unique to that individual and remains persistent across all devices, apps, and browsers.
Through hashing, consumer email addresses are transformed into anonymized identifiers that obfuscate personally identifiable information, resulting in a useful people-based anonymous identifier.
“Hashing” is simply a process of encoding email addresses using a cryptographic hashing function. This process creates an unrecognizable string of characters, or hash, to represent the email. Each hash has a fixed number of characters, depending on the type of hash function used.
Resonate accepts the following hashed emails:
- MD5 (preferred): MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function that creates a 32-character hexadecimal number. For example, ec55d3e698d289f2afd663725127
- SHA-1: SHA (Secure Hash Algorithm) generates a 40-character hexadecimal number. For example, A94A8FE5CCB19BA61C4C0873D391E987982FBBD3
- SHA-256: SHA 256 is a part of the SHA 2 family of algorithms and generates a 64-character hexadecimal number. For example, 051f26bd6cde782239bf52e56854d3feeca75ae5a84508d1ff9a1868ba167ee
Connecting to AWS
Resonate AWS Set Up Overview
AWS is a secure method of sending your data for Predictive Modeling with Resonate. Your Resonate Client Success Manager will work closely with you throughout the process.
To get started with AWS for Predictive Modeling, you will need to provide your Resonate Client Success Manager with your AWS ARN and Account ID. Those two pieces of information will be used to create a user role on an S3 bucket that Resonate engineers will create for you.
Once the S3 bucket, role, and policy have been created, your Resonate Customer Success Manager will provide you with a document that includes all the information that your team will need to implement on your side to share files. Below you will find the detailed instructions on how to set up the account - please keep in mind these steps cannot be completed prior to receiving your credentials from Resonate.
File Upload Steps
These are the steps that the user uploading to Resonate’s S3 path will need to follow
- Download AWS CLI if not already installed. Instructions located here: AWS CLI Installation Guide
- Assume Role set up - follow documentation: Assume Role Guide
- In your AWS credentials file (~/.aws/credentials) add a new source with the following:
- aws_access_key_id (aws access id for the uploading user)
- aws_secret_access_key (aws secret key for the uploading user)
- example:
- In your aws config file add a new profile with the following:
- role_arn (provided by Resonate)
- source_profile (the name of the new source created above)
- external_id = (provided by Resonate)
- example:
- Once that setup is completed you can assume the role by adding: --profile <your profile> to the end of your aws command.
- example: aws s3 ls s3://resonate-integrations/companyName/data-disrupt/feedName_10101/ --profile resonate
- In your AWS credentials file (~/.aws/credentials) add a new source with the following:
- Upload file using AWS S3 cp Command
- Once you have set your local AWS credentials, you can use the AWS command line to upload to the path provided by your account manager
- More information is located here: S3 Copy Guide
- example: aws s3 cp file.csv s3://resonate-integrations/companyName/disrupt/feedName_10101/ --profile
File Format
There are two files required, the raw data file and a manifest.
Basic Raw Data File Requirements
- Format: CSV format with a properly formatted header as the first row.
- Partitioned CSVs: Acceptable, but each partition must begin with the same header and adhere to all other formatting requirements.
- Identifier Column: The first column must be a unique hashed email identifier field.
- Rows: Each customer record must occupy exactly one row.
- Null Values: Acceptable in every column except the Email/ID column, ID Type column, and the identifier column.
- Date Columns: Expressed in ISO 8601 format (yyyy-MM-dd'T'HH:mm:ss.SSS'Z').
Raw Data File CRM Data Types
- Identifier and date columns: String values only.
- Other columns: Boolean or Numeric
- Boolean column supported values: yes, no, y, n, 1, 0, true, false
- Categorical and ordinal columns: Strings or 0-indexed sequential integers.
Raw Data File Formatting
Option 1 (recommended): File Formatting for Unhashed Emails:
Follow these formatting instructions if you have regular email addresses that you need to hash. Once formatted, you will use Resonate's hashing script to hash your emails. [Hashing Script Instructions]
1. Two column headers are needed within the file even if there is no data for a particular column.
- Column 1 Header: ID Type
- Column 2 Header: Email
2. After those two columns, the rest should contain the CRM data (if applicable)
- Columns 3 to N: CRM data
3. Checklist to verify elements before uploading to Resonate
- File contains the 2 required column headers
- ID Type
- File is in .csv format
- ID Type column is HEMMD5, HEMSHA1, or HEMSHA2
- A customer record occupies only one row i.e. there should be no duplicate rows with the same email
- No null values for ID Type, Email, or for the identifier column
- File contains the minimum number of records per segment - 6400
4. Once your file is in the correct format, please save with the following name:
- Name file(s) as follows: <client name>_<model_type>_<date>
- Example: Resonate_Churned_20220419.csv
Important: File cannot be compressed. |
5. Follow these instructions to hash your emails using Resonate's hashing script
6. Upload to S3 using the AWS S3 section above
Option 2: File Formatting for Existing Hashed Emails:
Follow these formatting instructions if you have one of the following IDs: MD5, SHA-1/SHA-128, or SHA-2/SHA-256.
1. Two column headers are needed within the file even if there is no data for a particular column.
- Column 1 Header: ID Type
- Column 2 Header: ID
2. After those two columns, the rest should contain the CRM data (if applicable)
- Columns 3 to N: CRM data
3. Use this checklist to verify elements before uploading to Resonate.
- File contains the 2 required column headers
- ID Type
- File is in .csv format
- ID Type column is HEMMD5, HEMSHA1, or HEMSHA2
- Verify that the hashed email is in the proper format denoted by the ID Type
- A customer record occupies only one row i.e. there should be no duplicate rows with the same email
- No null values for ID Type, Email, or for the identifier column
- File contains the minimum number of records per segment - 6400
4. Once your file is in the correct format, please save with the following name:
- Name file(s) as follows: <client name>_<model_type>_<date>
- Example: Resonate_Churned_20220419.csv
Important: File cannot be compressed. |
5. Upload to S3 using the AWS S3
Manifest File Requirements
The manifest file describes the CRM data included in your data file. Every column excluding the ID Type and ID columns needs to be defined in this file. This file is necessary for proper modeling as it describes to rAI how to utilize the CRM data properly.
Schema
- column_name: Exact name from the header in the data file.
- dtype: Data type. Supported values are:
- string/varchar - A standard word
- Int/integer - A number without decimals
- timestamp – an ISO 8601 formatted field
- double/float - A decimal value, use double in cases where there is high precision needed (more than 2 decimal points).
- variable_type: Data properties. Supported values are:
- boolean – data type represents a true or false value
- categorical – represents a set of possible strings
- ordinal - Represents a set of possible numbers
- numeric – A numeric value
- date/datetime – represents a timestamp
- usage: The purpose of the field
- identifier – A column that identifies the data (account ID, user ID, etc. for example)
- feature – Used to train the model
- label – The column to predict on. For churn, this would be the column specifying whether someone has churned in the past or not. For next best, this would signify your best customers amongst the dataset.
- Passthrough – Not used to train the model
- allowable_values: Possible values, including nulls. Leave this blank if any values are possible. This is important for categorical, ordinal, and boolean values so that rAI knows all of the possible values for those.
- column_description: Plain text description of what the column is.
Manifest File Format
Below shows an example of a manifest matching the sample raw file above:
Naming
Please name your file manifest.csv and upload to S3 in standard CSV format at the same path as the raw data files.
Comments
0 comments
Please sign in to leave a comment.