Five Data Mining Techniques That Help Create Business Value
Mo Data stashed this in Analysis Tips and Tricks
Data mining can help organisations and scientists to find and select the most important and relevant information. This information can be used to create models that can help make predictions how people or systems will behave so you can anticipate on it. The more data you have the better the models will become that you can create using the data mining techniques, resulting in more business value for your organisation.
The term data mining first appeared in the 1990s while before that, statisticians used the terms “Data Fishing” or “Data Dredging” to refer to analysing data without an a-priori hypothesis. The most important objective of any data mining process is to find useful information that is easily understood in large data sets. There are a few important classes of tasks that are involved with data mining:
Anomaly or Outlier detection
Anomaly detection refers to the search for data items in a dataset that do not match a projected pattern or expected behaviour. Anomalies are also called outliers, exceptions, surprises or contaminants and they often provide critical and actionable information. An outlier is an object that deviates significantly from the general average within a dataset or a combination of data. It is numerically distant from the rest of the data and therefore, the outlier indicates that something is out of the ordinary and requires additional analysis.
Association rule learning
Association rule learning enables the discovery of interesting relations (interdependencies) between different variables in large databases. Association rule learning uncovers hidden patterns in the data that can be used to identify variables within the data and the co-occurrences of different variables that appear with the greatest frequencies.
Clustering analysis is the process of identifying data sets that are similar to each other to understand the differences as well as the similarities within the data. Clusters have certain traits in common that can be used to improve targeting algorithms. For example, clusters of customers with similar buying behaviour can be targeted with similar products and services in order to increase the conversation rate. A result from a clustering analysis can be the creation of personas. Personas are fictional characters created to represent the different user types within a targeted demographic, attitude and/or behaviour set that might use a site, brand or product in a similar way. The programming language R has large variety of functions to perform relevant cluster analysis and is therefore especially relevant for performing a clustering analysis.
Classification Analysis is a systematic process for obtaining important and relevant information about data, and metadata – data about data. The classification analysis helps identifying to which of a set of categories different types of data belong. Classification analysis is closely linked to cluster analysis as the classification can be used to cluster data.
Regression analysis tries to define the dependency between variables. It assumes a one-way causal effect from one variable to the response of another variable. Independent variables can be affected by each other but it does not mean that this dependency is both ways as is the case with correlation analysis. A regression analysis can show that one variable is dependent on another but not vice-versa.
Stashed in: Big Data!