How Not to Do Predictive Modeling

Articles and speeches on predictive modeling invariably focus on successful case studies. However, sometimes even more can be learned from mistakes. In my travels as Vice President of Research and Consulting Services for Neodata, I certainly have run into my fair share of mistakes. For example:

No Net/Net List Deals

A prospecting model was built for a fundraiser using individual/household overlay demographics. Back tests of the model on live mailings showed impressive segmentation power. Unfortunately, the fundraiser did not have access to net/net list rental arrangements, which meant that the names eliminated by the model would have to be paid for (as would not be the case with a ZIP model).

Consider, for example, a hypothetical list with a published cost of $100/M, for which the model eliminated the bottom 8 deciles. The actual, in-the-mail cost would be $500/M, which clearly is not cost effective.

Fortunately, our research group performed financial analysis on the model before it was used in live mailings, and proved that under no realistic circumstances would the model ever be cost effective without net/net rental arrangements. (In fact, the analysis suggested that there exists no realistic circumstance in which any individual/household model will ever work for any direct marketer without the existence of net/nets.)

Modeled Out of Business

A predictive model was built to rank-order existing customers in terms of their probability of repurchasing in the future. Unfortunately, this model drove every single-buyer into deciles that were below the mail/no mail cutoff recommended by the research company. Fortunately, the client quickly realized that this strategy "although effective for maximizing short-term profits "would in the long-term drive the business into bankruptcy. (After all, the only way for single-buyers to become multi-buyers is to mail them!)

Decile vs. Decile

A predictive model was built by a "10 to 1 Decile shop"; that is, by a research company that labeled its best buyers "Decile 10" and its worst buyers "Decile 1." Unfortunately, the rest of the direct marketing world rank orders from Decile 1 to Decile 10. The model was forwarded to a service

bureau that had no previous relationship to the research company, with written instructions to "pull off the top four Deciles." The service bureau, mindful as it was of industry standards, proceeded to select Deciles 1 to 4, which resulted in the worst 40% of the file being mailed!

Where's the News?

A research company built a predictive model for a cataloger in which everyone was eligible to be scored: multi-buyers, single-buyers, inactives, inquiries, and cross-sell candidates from other catalog titles within the overall corporate umbrella. Unfortunately, regression models "as is true with all other predictive statistical techniques "take the path of least resistance when attempting to segment by the probability of future response. Therefore, the result was a "sediment model," in which multi-buyers were the primary residents of the top couple of deciles, single-buyers the residents of the next two, followed "sequentially "by inquiries, inactives, and cross-sell candidates.

Because the direct marketer already knew that multi-buyers generally perform better than single-buyers, who in turn generally perform better than inactives "and so on "the model essentially was worthless.

ZIP-Less Lift

A research company built a ZIP Code model to segment outside list rental prospects for a very targeted cataloger. Unfortunately, ZIP Code prospecting models generally do not display "lift," top 10% to average, of more than 140 (i.e., with an overall response rate of 1%, Decile 1 will not pull more than 1.40%). Because of the very circumscribed audience for this cataloger's product, only a handful of affordable rental lists are available, and all of those had response rates that were several times higher than average. As a result:

  • For the handful of affordable rental lists, even Decile 10 (the worst) names performed above the mail/no mail cutoff.
  • For all other rental lists, even Decile 1 (the best) names performed below the mail/no mail cutoff.

Therefore, the ZIP model, although statistically successful at differentiating responders from non-responders, was worthless from a business point of view.

Skeletons From My Closet

First Skeleton

Anyone who has built a large number of predictive models will make some mistakes. This is inevitable, given the complex processes that are involved in a successful model build and implementation. Because I am no exception, I'll conclude by confessing to some "past skeletons in my own closet":

As part of a large database deal with a cataloger, we agreed to put predictive models into production immediately after the completion of the database. Therefore, "time 0" analysis files were available only for mailings that were done off the previous service bureau's database structure. This meant that the models had to be constructed off the previous structure and then converted to the existing database structure.

Unfortunately, the previous database had many significant data anomalies. This resulted in a large percentage of the records in the new database not having a one-to-one correspondence in the values for many of the key fields (i.e., the previous database showed a net dollar amount of $300 for a given order, but warehoused raw transaction information showed a gross dollar amount of $300 and a returns dollar amount of $250).

As a result, the models were compromised and did not provide the expected segmentation power.

During this same project, it was noticed that there was an unusually large percentage of customers who had only one order. The client's answer was that the catalog had been growing rapidly over the past year. The real reason was discovered only later: about $80 million dollars of transactions, representing about 2.5 years of history, was missing from the database.

Needless to say, the models were compromised even further when what appeared to be the single-buyer inhabitants of the bottom deciles "who were in fact multi-buyers "ordered merchandise with a vengeance!

Second Skeleton

A model was built for a retail client to predict future purchase behavior, which did not perform well in the mail. A post mortem turned up nothing unusual until a serendipitous conversation with a programmer who had participated in bringing up this client's database several years earlier:

Unfortunately, although transaction information had been warehoused for several years, large scale gaps had existed in the data (e.g., many transactions had no dollar amount). The client contact, who had subsequently left the company, had a solution for this: simply plug artificial information, which would correspond to the average for all records which contained the data in question.

My response was that this was impossible because the Exploratory Data Analysis that was performed as part of the modeling project would have picked this up. The programmer's reply was that the client was particularly clever, and had written a program to generate a random "plus or minus" factor around the average that would correspond to the distribution of values for all records which contained the data in question!

Jim Wheaton is a Principal at Wheaton Group, and can be reached at 919-969-8859 or jim.wheaton@wheatongroup.com.  The firm specializes in direct marketing consulting and data mining, data quality assessment and assurance, and the delivery of cost-effective data warehouses and marts.  Jim is also a Co-Founder of Data University www.datauniversity.org.