How to Append Resonate Attributes to Unmatched Records

Resonate’s Data Science team has provided two options below to increase your match rate by extrapolating your matched records to the rest of your unmatched file. After you complete the Resonate Data Append process, your Data Science team can follow these instructions to append Resonate attributes to any unmatched records.

1. The Univariate Approach

For this method, you will take your private data and train a model to predict each of our appended attributes.Given the size of these datasets, we recommend using the pipeline API from spark with a random forest model.  This Medium post has a good overview and useful code:

https://medium.com/rahasak/random-forest-classifier-with-apache-spark-c63b4a23a7cc

Alternatively, you could also use XGboost, or even a logistic regression.

The general framework here is:

Vectorize and standardize your features for the whole dataset
Encode the target Resonate attribute as a 1.0 or 0.0
In the case of extremely imbalanced classes, you will want to use class weighting, up/down sampling to balance class samples, or SMOTE if your data allows it.
Train a regularized classification algorithm with some flavor of cross validation on the hyperparameters (k-fold, hyperopt, or train-test splits)
Apply results

2. Deep Learning Route

The more challenging path is the deep learning route, since this technology can learn to impute the entire stream of Resonate attributes at once.  The Tensorflow user guide has a useful tutorial:

https://www.tensorflow.org/tutorials/generative/autoencoder#first_example_basic_autoencoder

For this approach:

Vectorize and standardize your features for the whole dataset
Vectorize Resonate attributes as binary features (0.0 or 1.0)
Train a neural network with sigmoid outputs.  Follow the basic autoencoder framework but do not worry about bottleneck layers.
Apply results

Articles in this section

Comments

Articles in this section

Related articles