What is Data Mining?
Data mining is the extraction of useful and relevant data from the very large amount of data available and using it for increasing profit. Data extraction in recent past has gained importance because companies and individuals have realized the importance and the power it holds for doing analysis.This remains a tedious task as the efforts required are a lot as the amount of data available today is huge which cannot be processed with the technology available currently. Lack of specialized softwares to perform data mining is resulting it to be seen as a new branch in IT sector.
History of Data Mining
History of data mining can be attributed to the evolution and advancement of computers and storage devices. Computers made processing possible and faster day by day and storage facilities made possible that data can accessed later or as and when required. Processing and storage led to collection of large amount of data over the period of time ultimately resulting in terabytes and petabytes of data which needs to be put to used.
Further with invention of internet, world wide web and expansion of service sector the need to collect and store data increased many fold which now requires sorting to be put to use. The data generated and stored today is much more than what can be processed and thus it becomes essential and difficult to do data mining.
Fundamentals of Data Mining
Data mining deals with predicting future. Almost all the business owners will be interested in knowing the future to make right choices which will increase their profit. By data mining future is not exactly predicted but forecast is done. This requires estimations to be made and past trends to be identified, analysed and studied. This is now done by use of different models and by performing simulations which are not new as they have been used from decades. The methods and technology adopted has evolved over time and models used today provide much more accurate information. This has become complex as the parameters which needs to be studied keeps on increasing.
Data mining helps in identifying and addition of parameters which needs to be studied to reach a better decision. The result derived is more reliable as number of studies are conducted and alternatives are thought of on basis of which final action is taken.
Parameters of Data mining
Linking or Association: This is one of the most essential part and fundamental of data mining. It includes establishing relationship between different events and finding patterns.
Sequence or path analysis: This helps in understanding the effect of different steps on one other and ultimately looking for the interrelationship between different events.
Classification: This can be considered as sorting but while doing this under data mining new categories might be formed which might make data easier to understand.
Clustering: It can be considered as grouping of events/steps/data with similar characteristics under same head. This also helps in understanding the data better and easily.
Forecasting: This is the ultimate need and result of data mining. Once number of analyses are performed the end result is information about future which is generally quantitative. The number thus derived influence the decision and tells us the outcome of steps taken.
Importance/ Need of data mining
Data holds has the power to provide user with information if it is analysed properly. Information can be consider as power in today’s digital world where everything is getting automated which is possible only because of presence of digital data which can be processed by machines. Companies want to derive maximum profits for which they are required to conduct market research and various surveys than analyse them to find their strength and weaknesses. Data mining enables them to understand their growth and trends which will help them understand about the customer and improve their service to meet the needs of users.
Data mining helps not only the commercial but all the sectors and plays a vital role in urban planning. Governmental setup uses data to prepare budget reports and allocate money by analysing the past trends and identifying the priority areas. It helps in understanding climate change and reduce damage in cases of disaster.
Examples of applications of data mining
Example 1. Suppose a car international manufacturer is willing to expand the sales.
In this the company need information about the fastest growing car markets where it can sell its cars. For this data is required about past sales and the other companies present with which new company will be required to compete. This requires data mining to be performed to be able to reach to a conclusion after a number of analyses like annual sales data, growth rate, preferred segment, people’s preference for car, affordability etc.
Example 2. Insurance company. An insurance company want to start a policy to pay amount to those effected by a disaster in a particular area.
In this example the company need information about the disasters which have occurred in past and which might occur in future. Identification of all possible disasters and then assigning probability to to each as per which the final policy will be framed. Now since distastes are unknown a number of studies needs to be carried out and historic data needs to be collected and analysed. The data will be available from number of sources which will then be collected, stored, sorted and analysed.