The Data Problem
Cleaning free-form text entry data poses significant challenges for omni-channel merchants, across a variety of data collection strategies. In the case of using HDYHAU (“How did you hear about us?”) surveys to improve attribution, merchants often face major headaches around user-entered data.
Customers’ inputs can be highly variable, so brands must account for common misspellings, uncover related keywords, and manage page-long CASE statements. Yet despite best efforts, the end result tends to be the same: unusable or incomplete data as well as flawed attribution.
The Data Solution
We created a solution that cleans up user-entered survey data in the post-purchase survey platform Fairing. The resulting data can be used for marketing attribution, customer cohort analysis, and more.
We built a mapping document and transformation code to help you make sense of your customer responses in Fairing or from other platforms that involve free-form text entry data. You’ll be able to answer questions like:
- How can I better manage complex CASE statements?
- How can I easily map customer typos to the correct value?
- How are customers spelling Tim Ferriss’ name? …Can I spend less time cleaning customer responses and start enjoying my own “4-Hour Work Week”? :)
Expectations, Requirements, and Table Construction
Estimated build time: 5 to 20 Hours. The build time depends on the platform. If this is for Fairing, there will be a faster build, but if criteria changes need to be made, it will take longer. This estimate is based on having a flat datasource that contains one or two columns with any combination of free-form text, numbers, or strings.
- Google Sheets and Daasity’s Google Sheet Integration
- Basic familiarity with CASE statements
Table construction and notes:
- The table drp_staging.mapping_raw_data should contain a dataset with a primary key as well as two criteria columns that need to be mapped.
- A Google Sheet integration must be created that follows the mapping format, or is customized to a new use case. (see example below).
- This gets transformed into a table that is joined on the criteria specified in the Google Sheet drop-down menu for criteria.
Relevant Data and Considerations for This Analysis
For this transformation, you will need access to your Google Sheet, your source systems data tables (in this example we are using Fairing), and the drp_staging schema.
- Google Sheet Integration to program the mapped values and mapping criteria
- Source System data
- DRP_STAGING schema to create intermediary tables during the transformation
Google Sheet Integration
Create a Google Sheet Integration to specify the mapping criteria for the integration. A template example is shown in the screenshot below.
- In the example, the data entry on row 2 allows mapping of customer surveys which have a response exactly equal to newsletter, podcast, influencer, online article, or online ad AND any combination of a free-form customer entered data field with common misspellings of the author’s name “Tim Ferris.”
Source System Data (Fairing)
We then access data from the source system. We are able to map all misspellings and variants of Tim Ferriss’s name with this Google Sheet and the various combinations of the primary criteria to one Response Group and Sub Group.
Note: this solution prioritizes primary and secondary mapping, but this solution allows mapping of only a primary or only a secondary criteria.
The Transformation Code
Our templated solution allows quick implementation of the solutions described. Here is a quick overview of what the code does.
–Assign a unique identifier to each row in the Google Sheet:
–Split each row into multiple rows, allowing the multivalue inputs specified in the Google Sheet to be joined to the source system later in the transformation process.
–Combine subgroups into one column:
–Map responses to the source system based on the criteria, while prioritizing mappings with multiple criteria:
The ‘Missing from BSD’ record needs to be added to the Google Sheet to update the mapping. Thankfully, this is as simple as adding this value to the list that already exists: