What is Data Mining?
Data mining is the extraction of useful and relevant data from the huge amount of data available and using it for increasing profit. Data extraction in recent past has gained importance because companies and individuals have realized the importance and the power it holds for doing an analysis. This remains a tedious task as the efforts required are a lot as the amount of data available today is huge which makes data processing difficult with the technology available currently. Lack of specialized tools and software to perform data mining is resulting it to be seen as a new branch in the IT sector.
History of Data Mining
History of data mining can be attributed to the evolution and advancement of computers and storage devices. Computers made data processing possible and faster day by day and storage facilities made possible that data can be accessed later or as and when required. Processing and storage led to the collection of the significant amount of data over the period of time ultimately resulting in terabytes and petabytes of data which needs to be put to use.
Further with the invention of the internet, world-wide-web and expansion of the service sector the need to collect and store data increased many folds which now requires sorting to be put to use. The data generated and stored today is much more than what can be processed, presented and analyzed and thus it becomes essential and difficult to do data mining.
Fundamentals of Data Mining
Data mining deals with predicting the future. Almost all the business owners will be interested in knowing the future to make the right choices which will increase their profit. By data, mining future is not exactly predicted but the forecast is done. This requires estimations to be made and past trends to be identified, analyzed and studied. This is now done by use of different models and by performing simulations which are not new as they have been used for decades. The methods and technology adopted have evolved over time and models used today provide much more accurate information. This has become complex as the parameters which needs to be studied keeps on increasing.
Data mining helps in identifying and the addition of parameters which needs to be studied to reach a better decision. The result derived is more reliable as the number of studies are conducted, and alternatives are thought of on basis of which final action is taken.
Parameters of Data mining
- Linking or Association: This is one of the most essential part and fundamental of data mining. It includes establishing the relationship between different events and finding patterns.
- Sequence or path analysis: This helps in understanding the effect of different steps on one other and ultimately looking for the interrelationship between different events.
- Classification: This can be considered as sorting but while doing this under data mining new categories might be formed which might make data easier to understand.
- Clustering: It can be considered as a grouping of events/steps/data with similar characteristics under the same head. This also helps in understanding the data better and easily.
- Forecasting: This is the ultimate need and result of data mining. Once the number of analyses is performed the end result is information about the future which is generally quantitative. The number thus derived influence the decision and tells us the outcome of steps taken.
Importance/ Need of data mining
Data hold has the power to provide the user with information if it is analyzed properly. Information can be considered as the power in today’s digital world where everything is getting automated which is possible only because of the presence of digital data which can be processed by machines. Companies want to derive maximum profits for which they are required to conduct market research and various surveys than analyze them to find their strength and weaknesses. Data mining enables them to understand their growth and trends which will help them understand the customer and improve their service to meet the needs of users.
Data mining helps not only the commercial but all the sectors and plays a vital role in urban planning. Governmental setup uses data to prepare budget reports and allocate money by analyzing the past trends and identifying the priority areas. It helps in understanding climate change and reduce damage in cases of disaster.
Examples of applications of data mining
Example 1. Suppose an international car manufacturer is willing to expand the sales.
In this, the company needs information about the fastest growing car markets where it can sell its cars. For this data is required about past sales and the other companies present with which new company will be required to compete. This requires data mining to be performed to be able to reach to a conclusion after a number of analyses like annual sales data, growth rate, preferred segment, people’s preference for a car, affordability, etc.
Example 2. Insurance company. An insurance company wants to start a policy to pay an amount to those affected by a disaster in a particular area.
In this example, the company needs information about the disasters which have occurred in the past and which might occur in the future. Identification of all possible disasters and then assigning the probability to each as per which the final policy will be framed. Now since disasters are unknown a number of studies need to be carried out, and historic data needs to be collected and analyzed. The data will be available from a number of sources which will then be collected, stored, sorted and analyzed.