Fallacy of Automated Modeling

Regression as well as other traditional segmentation approaches have been under attack for quite some time. The battlegrounds are trade journals and conferences, and the opposition is the neural network companies.

This article does not explore the merits of neural networks versus traditional techniques. Nor does it question all of the proponents of the neural approach. Rather, the target is those who claim that neural networks allow the modeling process to be automated.

I frequently am frustrated when talking with modeling prospects. Many have read the neural network hype, and are predisposed to believe in the "push button" approach to segmentation. After all, automation will eliminate the need to interact with statisticians who ask difficult questions and speak in strange tongues.

It is my strong opinion that there never will be a substitute for the seasoned human analyst. This article will explain why, with ten examples "seven involving Exploratory Data Analysis and three involving Research Design "that are understandable to clients and prospects. (After all, the layman will never be convinced by mathematical equations!)

The following summarizes my message to all who ask about predictive modeling:

Whether the technique of choice is regression or neural nets (or, for that matter, tree analysis or genetic algorithms), what really separates the good models from the bad is the up-front work that must be done before the formal modeling process. (Also important is the back-end process of correctly implementing the model, time and time again, in a production environment.)

Before proceeding with the main body of this article, let me rephrase the previous paragraph in a way that I am sure will be controversial with certain factions of the neural network community:

If you want to get famous, promote the "push button" approach. If you want to build a great model, concentrate on the unglamorous up-front work "as well as the back-end implementation.

Seven Examples "Exploratory Data Analysis

Example #1

For a retailer with a $70 average order size, a handful of customers on the analysis file had lifetime total dollars between ten and twenty thousand dollars.

In this instance, a well-designed "push button" approach will receive a passing grade. After all, automating the detection of outliers is a simple task. Each of the unusual customers in our example can be eliminated from the analysis, and the modeling process can proceed.

Even in this case, however, human intervention is ideal. Although the deletion of outliers is a good start towards a robust predictive model, the optimal approach is to determine the root cause of these outliers. And, root causes always fall into one of two categories, each of which calls for a different response:

Category #1

Especially within the world of retail, extreme outliers generally represent bad data rather than true purchase behavior. Human investigation in such instances often results in refinements to the data capture process. This will enhance the long-term quality of the database and, in turn, all subsequent models. Two examples:

  • A sixty-six thousand dollar order was comprised of 111 $600 items. This was a keying error because the true number of items was one rather than 111.

The analysis file record showed one additional order of this same item, one day later, reflecting the previous day's true purchase activity. Unfortunately "and in accordance with existing procedure "the corresponding "negative transaction" to cancel the previous 111 items had not been forwarded to the database.

This inspired a rethinking of existing procedure!

  • A customer record showed hundreds of transactions within a six-month period, averaging only about $70 but totaling about thirty thousand dollars. This was an intentional keying error (if such a thing is not oxymoronic). A lazy clerk realized that the Point of Sale data capture system was driven by reverse phone number lookup, and that the continuous entering of his own phone number would eliminate the need to question customers.

This resulted in the creation and dissemination of a formal disciplinary procedure for all employees who continued to engage in such behavior.

Category #2

Sometimes, however, extreme outliers represent true purchase behavior. In such instances, extremely loyal buyers have been identified whose behavior should be rewarded and encouraged.

Example #2

This example, as well as all subsequent ones, provides quite a bit more of a challenge for an automated modeling system:

A weak positive relationship was found between response and customer distance from the nearest store. In other words, the greater the distance, the higher the response.

Good retail customers generally don't live far from a store. The distance variable represented the straight-line distance between each customer's ZIP Centroid and the nearest store's ZIP Centroid. It was theorized that, for many customers, this was not a sufficiently refined calculation. Distance was recalculated by Carrier-Route Centroid, and the relationship to response went negative.

Example #3

A strong positive relationship was found between response and customer ownership of the private-label credit card.

The variable was left out of the model! At the time of the analysis file mailings, the credit card had just been introduced. Therefore, the small number of card owners were generally the client's most fervent buyers. However, by the time the model was to be put into production, the card ownership universe had expanded significantly. Therefore, the relationship of card ownership to response would have changed dramatically.

Example #4

A large number of historical orders on a "time 0" file for a September mailing were nine months old.

"Lumpy" order patterns generally are no big deal. After all, many businesses are seasonal. Christmas is a make-or-break period for just about everyone. So, our nine-month old orders were no big deal, right?

Wrong! The database was maintained by a direct marketer whose business was driven by syndication arrangements with several outside companies. The idea was to merge the customer history for each of these companies, and to use the resulting database to drive up-sell as well as cross-sell efforts with predictive models.

Unfortunately, one of the outside companies was not sensitive to the needs of sophisticated database marketing, and warehoused all of its order transactions for nine months before forwarding them to the syndicator. You can imagine the segmentation chaos caused by the many customers whose records reflected these compromised orders!

Example #5

The relationship of historical average order size to the dependant variable was slightly unusual.

The database had been built several years earlier, and had been "primed" by a large number of "warehoused" historical orders. Unfortunately, because of the lack of systematic data capture during this formative period, many of these historical orders did not contain a dollar amount. The manager of the database build, who had since left the company, had "plugged" these missing dollar amounts with the mean "along with a normally-distributed "plus or minus factor." In other words, he had created artificial data "a time bomb "that would be a challenge for any future analyst to detect.

The point here is that the unusual pattern in the relationship of historical average order size to response was quite subtle. After all, several subsequent years of systematic data capture had populated the analysis file with many legitimate order sizes. Would "push button" software ask the questions necessary to uncover the subset of bogus orders on the analysis file? I think not.

As an aside, neural network hype often supplements the "push button" pitch with the claim that such systems can detect subtle data patterns that are invisible to regression. The challenge, however, is how to determine whether a subtle pattern is an anomaly-driven mirage or the reflection of true, long-term buyer behavior. Here's where the science of predictive modeling gives way to the art of predictive modeling. And, within the realm of art, the judgment and experience of the human analyst is paramount.

Example #6

Here is an additional example that required a human being to ask some questions:

A customer model was built off an analysis file consisting of four mailings. For each mailing, the analysis file was interrogated for basic reasonableness (mail quantity, response rate, dollars per piece mailed, etc.). It was immediately apparent that something was wrong. Additional investigation revealed that the response information had been appended to the incorrect mailings.

The point here is that even automated modeling systems require accurate analysis files. And, it's very easy to generate an inaccurate file. The process of appending response information to the mail history ("time 0") file(s) often involves a number of complex steps that invite error. The only way to uncover analysis file problems is with the inquiring mind of an analyst.

Example #7

The following example required such an inquiring analyst. Although the client did not cooperate with the correct answer, at least the correct questions were asked. Ask whether an automated modeling system would have done the same:

For a customer model, interrogation of the analysis file revealed an unusually large percentage of individuals with only one order at the time of each mailing (single-buyers). The client's answer appeared reasonable: the business had enjoyed rapid and recent growth.

The real reason was discovered only later: about $80 million dollars of transactions, representing about two and a half years of history, was unavailable when the database was constructed. The "live" model results suffered when who appeared to be the single-buyer inhabitants of the bottom deciles "who were in fact multi-buyers "ordered merchandise with a vengeance.

Three Examples "Research Design"

The aforementioned seven examples illustrate the need for high-quality Exploratory Data Analysis, which can only be provided by a human being. The astute reader will notice, however, that none of these seven examples requires an analyst with an advanced degree in statistics. Instead, what is needed is a skilled "data detective." Theoretically, any astute database marketer qualifies as a data detective.

This brings us to the common but more modest contention that neural networks "although not the ticket to automated modeling "eliminate the need for a skilled analyst. You've heard the pitch ""Marketers, build your own models!"

The problem with this argument is that "practice makes perfect," in Exploratory Data Analysis as well as just about everything else in life. Many marketers are smart, but few can match the experience gained by a seasoned analyst who has built dozens "sometimes hundreds "of models.

Modeling experience encompasses not only Exploratory Data Analysis but also Research Design. Consider the following examples that could "trip up" the smartest marketer who "charges in" with neural network technology but no design experience:

Example #1

A predictive model was built for a cataloger in which everyone was eligible to be scored: multi-buyers, single-buyers, inactives, inquiries, and cross-sell candidates from other catalog titles within the overall corporate umbrella. Unfortunately, statistical models "whether traditional or neural networks "take the path of least resistance when segmenting by the probability of future response. Therefore, the result was a "sediment model," in which multi-buyers were the primary residents of the top couple of deciles, single-buyers the residents of the next two, followed "sequentially "by inquiries, inactives, and cross-sell candidates.

Because the direct marketer already knew that multi-buyers generally perform better than single-buyers, who in turn generally perform better than inactives "and so on "the model essentially was worthless.

Example #2

A prospecting model was built for a fundraiser using individual/household-level overlay demographics. External validations of the model showed impressive segmentation power. Unfortunately, the fundraiser did not have access to net/net list rental arrangements, which meant that the names eliminated by the model would have to be paid for (as would not be the case with a ZIP-level model).

Consider, for example, a hypothetical list with a published cost of $100/M, for which the model eliminated the bottom eight deciles. The actual, in-the-mail cost would be $500/M, which clearly is not cost effective.

Fortunately, financial analysis was performed on the model before it was used in live mailings, and proved that under no realistic circumstances would the model ever be cost effective without net/net rental arrangements. In fact, the analysis suggested that there exists no realistic circumstance in which any individual/household-level prospecting model will ever work for any direct marketer without the existence of net/nets.

Example #3

A ZIP Code model was built to segment outside list rental prospects for a very targeted cataloger. Unfortunately, ZIP Code prospecting models generally do not display "lift," top 10% to average, of more than 140 (i.e., with an overall response rate of 1%, Decile 1 will not pull more than 1.4%). Because of the very circumscribed audience for this cataloger's product, only a handful of affordable rental lists were available, and all of those had response rates that were several times higher than average. As a result:

  • For the handful of affordable rental lists, even Decile 10 (the worst) names performed above the mail/no mail cutoff.
  • For all other rental lists, even Decile 1 (the best) names performed below the mail/no mail cutoff.

Therefore, the ZIP model, although statistically successful at differentiating responders from non-responders, was worthless from a business point of view.

Summary

There is no magical shortcut when building a predictive model. If you want good results,

concentrate on the up-front work "sound Research Design and meticulous Exploratory Data Analysis "that must be done before undertaking the formal modeling process. Also, invest in the services of a seasoned analyst rather than a "push button" system.

I'll close with a final thought: Neural network proponents claim that their software is superior in recognizing underlying patterns within the data. If this is true, then the unglamorous, analyst-driven, up-front process will be even more critical. This is because neural networks will, by definition, do a superior job of identifying the spurious patterns that are inherent in bad data. Therefore, the resulting scoring algorithm will point us even farther away from our true target market!

In other words, there is no substitute in our business for a hard-boiled data detective!