top of page
Search
  • Writer's pictureMinseo Kim

How Big Data Knows Us Better Than We Do: Data Mining and Its Workings

By - Tanvi Singh


Have you ever wondered how certain ads show up on your explore page, that seems to be specific to you? It's as if Instagram or Facebook knew exactly what you were thinking of. Well, it's more than just mind-reading and a strange form of espionage; it's data mining. In the era of digitization there are only a few things that are not left up to a computer, and believe it or not, most things about you are also digitised, including your decision making and consumer habits. It all has to be owed to data mining.


 

Data Mining


What is data mining? Data mining is the analysis and examination of big sets of raw data to determine future outcomes and to recognize certain patterns and similar structures in the data. It uses mainly statistics and machine learning technology to detect relationships and trends in data. It is the science of knowing and learning a user’s habits to accordingly optimise services to each one.

The history of data mining goes back to times even before the first programmable computers were introduced— the Bayes’ Theorem was used to calculate the probability of an occurrence based on prior knowledge and information. This goes to show that statistics have a big role to play in modern data mining, as most models are based on statistical analysis.


There are a plethora of applications of accepted data mining used mainly in business for marketing purposes or the actuarial aspects. The targeted ads that were previously mentioned use clustering and association analysis to ensure that they are personalised. Those types of data mining use demographic data as a base since it is easiest to classify using age, gender, and location. Data mining is also used by credit card companies and tax services to look out for anomalies in spending to detect fraud or tax evasion.

Logistic regressions are used for services like those since they use internal data to predict outcomes, rather than using general information to do so. Another important use of data mining is for diagnoses and future detection of illness in healthcare. It is mainly known as expert system diagnoses, and it uses many components of data mining. The K-Nearest Neighbour method is used for things like this, as it categorises new outcomes based on older ones. This method is also used for recommender systems, found in Netflix or Amazon Prime Video to push out similar content for users.


 

Process


The process of general data mining comprises 6 main steps, including:

  • Business Understanding

  • Data Understanding

  • Data Preparation

  • Data Modelling

  • Evaluation

  • Deployment

The combination of these steps effectively creates algorithms which personalize services and data models.


The first step involves making goals for the specific project, and to understand how data mining will help achieve the intended target. Things like target audience, costs, and current trends have to be factored in at this point.

The next step requires data to be collected from various places in the organisation and can come in the form of databases or even data cubes. Data visualisation tools and Metadata must be used to correctly classify and organise the data, to prepare it for the next step.

The third step makes up the majority of the time taken to mine data. The data is filtered out to only leave the important things, and gaps are filled to have a set of data that is minable. Since this step is very time consuming, distributed systems are used to ensure that the systems are not overworked, and to ensure that the data is being efficiently processed.

Next, mathematical models will be used to detect trends and patterns in the data, which will then be tested out by necessary stakeholders.

The fifth step is necessary to ensure that the patterns are in line with business objectives, and sometimes new objectives will be brought up based on new information.

In the final step, the findings are shared and implemented in the business, and will eventually improve and meet the goals of the business. This process has been kept simple, and will likely change in the coming future, to accommodate emerging discoveries in the field.



The future of data mining will be up to Ubiquitous Knowledge Discovery since it has to do with combining data mining from mobile systems and distributed systems. As more things are becoming portable, and less bulky in the tech world, mobile systems will be made popular. This system is also made for real-time mining and processing while holding privacy and security to the utmost importance. The main concern of data mining is the privacy and security aspect of it since it has been taken advantage of by major companies.


 

An example of privacy and security breach in the form of data mining is the Cambridge Analytica and Facebook scandal, in which the personal data of millions was harvested to be used for political advertising. This Facebook leak is argued to have swung elections in the United States. Data mining was used by the pair when users were asked to complete a quiz used to build profiles on them, and in turn to politically swing the opinions of many, using Facebook advertising. This incident caused a major controversy in the field of data mining, and whether it was deemed ethical.


Although the previously mentioned incident opposes so, data mining has got a myriad of benefits. Since humans have gotten used to immediate gratification in the era of the internet, data mining has helped consumers gain gratification through one-click as things have become extremely personalised. This will also help the user filter through many unnecessary products or services, as ads have been targeted to their interests. As for businesses, data mining will allow them to gain a competitive advantage over their counterparts since decision making about new products and services become easier and more effective. As mentioned before, data mining is essential in modern diagnosing of patients and is comparatively time-efficient as opposed to an actual doctor. The most efficient diagnosis will be made by a doctor, with the aid of an expert system using data mining.



 

Data mining is extremely helpful in the rise of digitisation, and when used ethically, with the best interests of both consumers and operators in mind, it can and will change the way we do things entirely. So, the next time you see an oddly specific ad, remember that it has a lot more to do with data mining than intrusive methods.





 

Reference Website:

https://techcrunch.com/2020/01/06/facebook-data-misuse-and-voter-manipulation-back-in-the-frame-with-latest-camb ridge-analytica-leaks/ https://www.researchgate.net/publication/337324588_UBIQUITOUS_DATA_MINING http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/index.html#:~:text=For%20businesses%2C%20data %20mining%20is,and%20accurately%20predict%20customer%20loyalty. https://www.microstrategy.com/us/resources/blog/bi-trends/top-10-benefits-of-data-mining https://pdfs.semanticscholar.org/2308/1071eccebc5001b97c45a64eaf33a5b6a72e.pdf https://www.bio-itworld.com/newsitems/2005/06-05/06-23-05-news-oracle/ https://link.springer.com/chapter/10.1007/978-3-642-30487-3_2 https://sites.google.com/site/fsubiwiki/home/data-mining/history https://data-flair.training/blogs/data-mining-algorithms/#:~:text=5%20Algorithm%2C%20K%20Nearest%20Neighbor s,let's%20start%20Data%20Mining%20Algorithms. https://www.flatworldsolutions.com/data-management/articles/data-mining-future-trends.php https://www.microstrategy.com/us/resources/introductory-guides/data-mining-explained




114 views0 comments

Recent Posts

See All

コメント


bottom of page