Problems Caused By Enhanced Marketing Database Content

Marketing Database Content should be enhanced whenever possible. For example, the July 2 e-Letter discussed how re-engineered operational systems can be leveraged to make important improvements to Content. However, many a data management professional has implemented an improvement to a Marketing Database, only to destroy the effectiveness of existing data mining routines such as predictive models and customer clusters.

At a minimum, willy-nilly improvements in Content result in last-minute fire-drills as the data mining team scrambles to update its models and segments to meet a production deadline. Sometimes, the problems in the routines are not discovered until after the promotional campaign has been executed, and senior management starts asking pointed questions about alarmingly-poor results. Although the following example is taken from Retail, the lessons are universal:

The distance a customer lives from a store generally is predictive of future purchase behavior. Typically, close proximity is indicative of higher revenue. And, distant customers who are loyal often are "stocker-uppers"; that is, they shop less frequently than other loyal customers but purchase more per visit.

Distance can be calculated many ways, and accuracy varies by methodology. The most accurate are "rooftop-to-rooftop" calculations, some of which take into account roadways and natural barriers such as lakes and rivers. Somewhat less accurate are methods based on small-area geographic units such as ZIP+4's, Census Block Groups and Carrier Routes. Specifically, what is calculated is the distance between the center ("Centroid") of the customer and store-address unit of geography.

The least accurate "and all-too-common "way to calculate distance is based on ZIP Code. This is problematic in many ways, such as in rural areas where individual ZIP Codes span many miles. Frequently, for example, a customer will live within the same ZIP Code as the store. In these instances, the ZIP-level distance-to-store will be 0, which of course has no relation to reality.

Now, consider what will happen if the distance field in a Marketing Database is changed from ZIP to rooftop. This will be a significant improvement in accuracy. However, the distance value for essentially all customers will change, and often quite dramatically. For example, a rural customer within the same ZIP as the store might have his or her field value increased from 0 to 15 miles.

Now, assume the existence of a predictive model that includes distance-to-store as one of its predictor variables. Because the field values for the scored customers will have changed, the predictive model will be compromised. In fact, the targeting accuracy of the model might very well be destroyed. Deploying such a model on an improved Marketing Database will result in a serious financial setback!

The specific antidote is careful, up-front coordination with the data mining team, rigorous quality assurance procedures, and "importantly "those rare data management professionals who are more than technicians; that is, who have developed deep direct and database marketing expertise.