DATA MINING

                      Data mining, also known as knowledge discovery in data (KDD), is the process of uncovering patterns and other valuable information from large data sets. Data mining is the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.The concept of data mining has been with us since long before the digital age. The idea of applying data to knowledge discovery has been around for centuries, starting with manual formulas for statistical modeling and regression analysis.Given the evolution of data warehousing technology and the growth of big data, adoption of data mining techniques has rapidly accelerated over the last couple of decades, assisting companies by transforming their raw data into useful knowledge. Data has become a part of every facet of business and life.

                     Companies today can harness data mining applications and machine learning for everything from improving their sales processes to interpreting financials for investment purposes.However, despite the fact that that technology continuously evolves to handle data at a large-scale, leaders still face challenges with scalability and automation.Data mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.Data mining is used in many areas of business and research, including sales and marketing, product development, healthcare, and education. When used correctly, data mining can give you an advantage over competitors by making it possible to learn more about customers, develop effective marketing strategies, increase revenue, and decrease costs.

 Data Mining - Demakis Technologies - IT Solutions

 

Data mining process

                     Data mining is typically done by data scientists and other skilled BI and analytics professionals. But it can also be performed by data-savvy business analysts, executives and workers who function as citizen data scientists in an organization.Without a clear focus on a meaningful business outcome, you could find yourself poring over the same set of data over and over without turning up any useful information at all. Once you have clarity on the problem you are trying to solve, it’s time to collect the right data to answer it — usually by ingesting data from multiple sources into a central data lake or data warehouse — and preparing that data for analysis.

                    The data mining process can be broken down into these four primary stages:

 Data gathering. Relevant data for an analytics application is identified and assembled. The data may be located in different source systems, a data warehouse or a data lake, an increasingly common repository in big data environments that contain a mix of structured and unstructured data. External data sources may also be used. Wherever the data comes from, a data scientist often moves it to a data lake for the remaining steps in the process.

 Data preparation. Get the data ready for analysis. This includes ensuring that the data is in the appropriate format to answer the business question, and fixing any data quality problems such as missing or duplicate data. Data transformation is also done to make data sets consistent, unless a data scientist is looking to analyze unfiltered raw data for a particular application.

 Mining the data. Once the data is prepared, a data scientist chooses the appropriate data mining technique and then implements one or more algorithms to do the mining. In machine learning applications, the algorithms typically must be trained on sample data sets to look for the information being sought before they're run against the full set of data.

Evaluation of results and implementation of knowledge. Once the data is aggregated, the results need to be evaluated and interpreted. When finalizing results, they should be valid, novel, useful, and understandable. When this criteria is met, organizations can use this knowledge to implement new strategies, achieving their intended objectives.

Data mining techniques

                Data mining works by using various algorithms and techniques to turn large volumes of data into useful information. Here are some of the most common ones:

 Association rule learning: Also known as market basket analysis, association rule learning looks for interesting relationships between variables in a dataset that might not be immediately apparent, such as determining which products are typically purchased together. This can be incredibly valuable for long-term planning.

 Decision tree: This data mining technique uses classification or regression methods to classify or predict potential outcomes based on a set of decisions. As the name suggests, it uses a tree-like visualization to represent the potential outcomes of these decisions.

 Clustering: To help users understand the natural groupings or structure within the data, you can apply the process of partitioning a dataset into a set of meaningful sub-classes called clusters. This process looks at all the objects in the dataset and groups them together based on similarity to each other, rather than on predetermined features.

 K- nearest neighbor (KNN): K-nearest neighbor, also known as the KNN algorithm, is a non-parametric algorithm that classifies data points based on their proximity and association to other available data. This algorithm assumes that similar data points can be found near each other. As a result, it seeks to calculate the distance between data points, usually through Euclidean distance, and then it assigns a category based on the most frequent category or average.

 

Data mining applications

Education

Educational institutions have started to collect data to understand their student populations as well as which environments are conducive to success. As courses continue to transfer to online platforms, they can use a variety of dimensions and metrics to observe and evaluate performance, such as keystroke, student profiles, classes, universities, time spent, etc.

 Financial services

  Banks and credit card companies use data mining tools to build financial risk models, detect fraudulent transactions and vet loan and credit applications. Data mining also plays a key role in marketing and in identifying potential upselling opportunities with existing customers.

Fraud detection

While frequently occurring patterns in data can provide teams with valuable insight, observing data anomalies is also beneficial, assisting companies in detecting fraud. While this is a well-known use case within banking and other financial institutions, SaaS-based companies have also started to adopt these practices to eliminate fake user accounts from their datasets.

 Healthcare

Data mining helps doctors diagnose medical conditions, treat patients and analyze X-rays and other medical imaging results. Medical research also depends heavily on data mining, machine learning and other forms of analytics.

Comments

Popular posts from this blog

BASIC OF NEURAL NETWORK

INTRODUCTION TO BIG DATA

NATURAL LANGUAGE PROCESSING