The goal in data analysis is always good performance on future, unseen data.
Mo Data stashed this in Analysis Tips and Tricks
Data analysis has become super easy.
But has it? I think people want it to be, because they have understood what data analysis can do for them, but there is a real shortage in people who are good at it. So the usual technological solution is to write tools which empower more people do it. And for many problems, I agree that this is how it works. You don’t need to know TCP/IP to fetch some data from the Internet because there are libraries for that, right?
For a number of reasons, I don’t think that you can “toolify” data analysis that easily. I wished it would be, but from my hard-won experience with my own work and teaching people this stuff, I’d say it takes a lot of experience to be done properly and you need to know what you’re doing. Otherwise you will do stuff which breaks horribly once put into action on real data.
Data Analysis is so easy to get wrong
You always need to be aware of what you are doing and have to mentally trace the steps to have a well-informed expectation about what you are doing. If it’s debugging without an error message, often you just have a gut feeling that something is quite wrong.
It’s too easy to lie to yourself about it working
The goal in data analysis is always good performance on future, unseen data. This is quite a challenge. Usually you start working from collected data, which you hope is representative of the future data. But it is so easy to fool yourself into thinking it works.It’s very hard to tell whether it could work if it doesn’tA different problem is that it is fundamentally difficult to know whether you can get better if your current approach doesn’t work well. The first thing you try will most likely not work, as will probably the next thing, and then you need someone with experience to tell you whether there is a chance or not.
No way around learning data analysis skills
So in essence, there is no way around properly learning data analysis skills. Just like you wouldn’t just give a blowtorch to anyone, you need proper training so that you know what you’re doing and produce robust and reliable results which deliver in the real-world. Unfortunately, this training is hard, as it requires familiarity with at least linear algebra and concepts of statistics and probability theory, stuff which classical coders are not that well trained in.
Data analysis is SO easy to get wrong. Sadly.
Yes, we always recommend a process for data mining that is a cross between waterfall and agile. We ask analysts to follow a scientific method - define the problem, create a hypothesis, build solutions and test them until they break. But we also subject everything to both peer review and client review (so continually pushing it back to the business to get their take on things, that also smooths the final implementation)