Machine Unlearning: The Value of Imperfect Models #bigdata
Mo Data stashed this in Analysis Tips and Tricks
A project manager once told me that “any job worth doing is worth doing poorly.” I understood exactly what she meant, and she knew that I would understand, especially when she preceded our conversation with these words: “I wouldn’t say this to everyone, but I know you will understand what I mean.” The message was clear to me because I was a perfectionist (and hopefully I have learned over the years to be less of a perfectionist thanks to my project manager’s wise counsel). As a perfectionist, I would strive for 100% completion and perfection on every project, every analysis, and every report. It would take me longer than most people to finish the analysis and report, and my manager understood why. Her beneficial solution was to give me “permission” to do the job poorly, which (to me) meant doing the job “only” at the 98% or 99% level. With those freeing words, I was able to finish my big report (from three years of work) within days instead of weeks, and I was able to move on to bigger and better project responsibilities. I soon became a project manager myself and promoted the oft-quoted wisdom that “perfect is the enemy of good enough.”
The above comments are just as applicable to our machine learning models that we train and deploy for big data analytics. If we strive for perfection, we run several risks, as listed below:
By attempting to build a model that correctly follows every little nuance, deviation, and variation in our data set, we are consequently almost certainly fitting the natural variance in the data, which will never go away. After building such a model, we may find that it has nearly 100% accuracy on the training data, but significantly lower accuracy on the test data set. These test results are guaranteed proof that we have overfit our model. Of course, we don’t want a trivial model (an underfit model) either – to paraphrase Albert Einstein: “models should be as simple as possible, but no simpler.”
Okay, that is an expression that gives another nod to Einstein and his theory of special relativity (how time seems to expand in a fast-moving reference frame). In this context, what I really mean to say is what my project manager was conveying to me – “get the project finished in a timely manner, please.” That consequently means accepting some less than perfect model, when a “good enough” model is more quickly acquired.
Missing the Low-hanging Fruit
There are many big data practitioners and consultants these days who are saying that the best way for businesses (especially small-to-midsize firms) to reap the benefits of big data analytics quickly is to go after the low-hanging fruit. That means that we should first find a fast, meaningful, effective model for our business case, and start using it on some of the easy and obvious data-oriented problems in our organization. For example, a financial services business was attempting to build a complex model of customer retention, to keep their customers from defecting to a competing firm. After some relatively simple queries of their web analytics metrics, it was found that those customers who did defect actually spent a lot more time in their online account (obtaining information about their personal accounts) just prior to transferring their funds to another firm. Given this new “single bit of information”, the business made one small but important change – whenever their web analytics dashboard signals that a customer is spending a lot of time exploring their online account, balances, histories, etc., the business will now have their call center customer service reps give that customer a brief courtesy call, to see if the firm can help them in any way, or answer their questions, or provide them with some information. The business’ customer defection rate has nearly gone to zero since that discovery and new intervention, which was based on some fairly simple “low-hanging fruit”.
Bias: Putting Demographics ahead of Diversity
As indicated in the overfitting paragraph above, nearly all data (or maybe, all data) have some natural variance. In discovering what that variance is, and then embracing that diversity in our data collection, we have great power in our analytics hands. For example, we can use that diversity of customer behaviors, or interests, or intents to develop segments and personas within our big data collection. Not everyone is the same, and the more we know about the diversity in our individual customer’s interests and behaviors, then the greater value will we achieve from our descriptive and predictive analytics models. Some folks say that the big data era represents the “end of demographics.” I agree with that! The age of personalization is here. We miss that power of personal predictive analytics when we strive to build one model that explains everything. It is not surprising that (in statistics and machine learning) underfitting is often referred to as bias in model-building – that is, we are basing a decision (building a model) on a very limited (insufficient) set of features and attributes, which is indeed the very essence of bias!
Therefore, when building machine learning models for big data analytics projects, it is useful to pay attention to a little machine unlearning. Allow your models to have a little less perfection than your intuition might prefer, and allow your models to have a natural failure rate that is consistent with the natural variance in the data collection. In doing this, you pay respect to the good-old fashioned ROC (Receiver Operating Characteristic) curve in statistics. When building, testing, and validating a suite of models, the optimal model (i.e., the optimal point in the ROC curve) is the spot where the overall accuracy of the model is essentially unchanged as we make small adjustments in the choice of model. That is to say, the change (from one model to the next) in the model’s precision (specificity) is exactly offset by the change in the model’s recall (sensitivity). Consequently, allowing for some imperfection, where the false positive rate and the false negative rate are in proper balance (so that neither one has too great of an impact on the overall accuracy and effectiveness of the model), is good fruit from your big data labors.
As we move into the era of The Internet of Things (IoT, and the IoE: Internet of Everything), in which we will have vast opportunities to develop and deploy analytics models on streaming data from all types of devices, machines, and sensors, it is good to know that we can reap great benefits from fast, simple, slightly imperfect machine learning. It might be the most efficient and effective way to achieve MapR’s promise of “processing IoT and IoE data at scale and in real time.”