At Resonate, we are obsessed with data quality. Each research wave, we exhaustively examine all possible survey response patterns to identify people who may be answering inattentively. By eliminating these records from our survey data, the clarity and stability of our insights, as well as the precision of our predictive models, is enhanced.
There are various strategies that researchers take for addressing the problem of inattentive or careless respondents generating poor-quality data. Some, like GfK, focus on ranking survey providers. The emphasis by GfK on sample quality is certainly part of the solution, and their use of measuring sample bias through a benchmark comparison to identify good quality sample providers is prudent. However, the critical challenge observed by Resonate is that substantial variation of survey response quality exists within the sample from each provider.
For any one of the top-tier sample providers used by Resonate, there is an observed distribution of respondents along a continuum of data quality – with many respondents completing our U.S. Consumer Study survey attentively and carefully, and some failing to do so. Thus, the emphasis should not be primarily on contrasting and ranking sample providers, but on the more impactful issue of assessing data quality at the level of the individual respondent so that those consumers who are most egregiously inattentive or careless can be identified and excluded from the survey dataset.
Traditionally, this goal has been pursued through the incorporation of “attention check” or similar questions into a survey, each of which is designed to explicitly detect incorrect responses. The difficulty is that professional survey takers are aware of these techniques. Additionally, their inclusion in a survey can negatively impact respondent engagement because their presence may be interpreted as a sign that the company doesn’t trust the respondent. For these and additional reasons (e.g., consuming survey real estate), Resonate has developed a more sophisticated approach.
We define a respondent exhibiting low data quality as an individual for whom there exists comparatively many survey responses that fail to reflect the closest approximation to fact. A tell-tail sign of such respondents is the frequent occurrence of rare response patterns – i.e., combinations of answer choices across questions that occur for few, if any, other respondents. It is the prevalence of these rare response patterns (calculated by examining every possible pairwise combination of survey responses across all questions) that is used to flag individual respondents with highly suspicious responses.
For example, a rare response pattern might be selecting a rating of “2” to “How satisfied are you with your smartphone?” on a 0-10 satisfaction scale, while also selecting a rating of “9” to “How likely are you to recommend your smartphone to a friend or colleague?” on a 0-10 recommendation scale.
While any one such response pattern is possibly legitimate, a preponderance of rare response patterns, relative to all other respondents in the same survey, suggests that the respondent’s survey is of low quality and should be discarded.
In summary, comparing quality across sample providers is a meaningful but a fundamentally weak contribution to the overall quality of a survey dataset. Addressing the problem at the more granular level of individual respondents is, in the view of Resonate, the most impactful path forward.
Tom Lacki, Ph.D.
Senior Fellow, Research