Data mining is defined as process that helps in discovering patterns in data sets by using various methods from machine learning, statistics and database systems. Data mining can also be defined as interdisciplinary subfield of computer science whose main goal is to extract information from a data set using a intelligent method and transform the information into a consolidated structure for later use. Many people treat data mining as the synonym for Knowledge Discovery in Databases(KDD) process ,while others view data mining as the analysis step of the KDD.
Knowledge Discovery in Databases(KDD)
KDD is the multi-step process of searching knowledge in the large volume of data which are hidden when searched through common techniques.Before applying KDD ,one needs to be technically capable of generating and storing the data.Raw version of collected data is simply a collection of elements and can only provide very little knowledge .The value of this data is significanly improved ,when we use knowledge discovery techniques.
There are many available methods which can assist in extracting patterns and can provides valuable,possibly previously unknown, insight into the stored data.Information obtained from the data can be predictive or descriptive in nature.
A variety of methods are available to assist in extracting patterns that when interpreted provide valuable, possibly previously unknown, insight into the stored data. This information can be predictive or descriptive in nature. Data mining, the pattern extraction phase of KDD, can take on many forms, the choice dependent on the desired results. KDD is a multi-step process that facilitates the conversion of data to useful information.
KDD process has iterative sequence of below steps:
1.Data Cleaning : It removes the noise and inconsistent data from inconsistent data.
2.Data Integration : In data integration step ,multiple data sources are combined. In many tech companies, data cleaning and data integration is done as a preprocessing step after which resulting data is stored in a data warehouse.
3.Data Selection : In this step relevant data that are needed for analysis are retrieved from the database or any other sources.
4.Data Transformation : Data are transformed and consolidated into different formats which can be used for data mining step. In some companies data transformation and consolidation are performed before the data selection process when there is a use of data warehousing using Hive or other ETL tools.
5.Data Mining : Various Intelligent methods are applied to extract data patterns frm the data.
6. Pattern Evaluation : Interesting patterns representing Knowledge are identified.
7. Knowledge Presention : Visualization and Knowledge representation techniques are used to present mined knowledge to Users.
Application of Data Mining
Data mining has wide and diverse range of uses in different areas.
- Fraud Detection in Finance and Banking Sector(Credit Cards)
- Financial Forecasting
- Analyze Geospatial/Satellite Imagery
- Addressable/Data Driven/Targeted Marketing
- Weather Forecasting
- Predict Telivision Audience viewership
- Gene Squencing
Functionalities of Data Mining
Functionalies in data mining are used to specify the kinds of Patterns or Knowledge that can be found in data mining tasks.
Some of the functionalities are mentioned below.
- characterization and discrimination
- Mining of frequent patterns
- outlier detection
Data Types used in Data Mining
Data mining can be applied to variety of data as per needed by target application . Data types of data mining can be categorized into structured/traditional and unstructured data type.
Structured data inludes data from database, data warehouse and transactional data .
Unstructured data can include some of the below data type.
- Time Series Data
- Sequence/Binary Data
- Data Streams
- Spatial,Spatiotempotral and Geospatial data sets
- Text and Media data sets
- Graph data
- Data from Networks
- Web Data( Clickstream Logs)