Big data and privacy


On the internet, everyone may disclose certain fragments of information, either actively or passively. This information can be mined by big data, posing a risk of privacy breaches and information security issues. Facing the surging 5G era, people are becoming increasingly confused about protecting their privacy, and even a little lost. So, how does big data know about your privacy? And how can we protect ourselves?


1.Big data knows everything.



In the era of big data, everyone has the potential to become the Emperor in Hans Christian Andersen's fairy tale "The Emperor's New Clothes". In the face of big data, it knows what you've said, what you've done, what your hobbies are, what illnesses you've had, where you live, who your friends and family are - in short, it knows almost everything you know, or at least it can know, and it will eventually know!


Big data can even know things that you don't know about yourself. For example, it can discover many of your subconscious habits, like where you prefer to stand in a group photo, which foot you prefer to step with when crossing a threshold, what kind of people you like to deal with, what your personality traits are, and which friends have different views from you.


Furthermore, big data may be able to predict future events. For example, based on information such as "eating more and exercising less", it can infer that you may develop "three highs". When many people are independently purchasing cold medicine, big data knows that a flu outbreak is imminent! In fact, big data has successfully predicted the results of the World Cup, stock market fluctuations, price trends, user behavior, traffic conditions, and more.


Of course, here "you" does not just refer to "you as an individual", but also includes, but is not limited to, your family, your company, your nation, and even your country. As for what these known, unknown, or only-to-be-known-in-the-future pieces of information will shape you into - a hero or a scoundrel - that is hard to predict.


2.Data mining is like "garbage handling"



What is big data? Figuratively speaking, big data refers to a myriad of data that is piled up in a disorderly manner. For example, the things you say online, the WeChat messages you send and receive, and the emails you send and receive are all part of big data. Information that is passively collected without your knowledge, such as videos captured by street cameras, route maps left by your phone's positioning system, and navigation signals when driving, is also part of big data. Additionally, all kinds of sensor devices that automatically collect information about temperature, humidity, speed, and other things are still part of big data. In short, every person, every communication and control device, whether it is software or hardware, is actually a source of big data.


Big data uses a technology called "big data mining," which uses methods such as neural networks, genetic algorithms, decision trees, rough sets, coverage of positive examples and exclusion of negative examples, statistical analysis, fuzzy sets, etc. to mine information. The process of big data mining can be divided into eight steps: data collection, data integration, data reduction, data cleaning, data transformation, mining analysis, pattern evaluation, and knowledge representation.


However, this big data that sounds fancy is actually like "garbage handling". It is like looking for a needle in a haystack, with a large amount of data that has no use, and only a small part of it can be used to draw meaningful information.




3.Big Data Mining has no End



Big Data Mining, while creating value in positive ways, also has its negative impacts, such as the risk of privacy breach. How is privacy breached? Let's break down how "human flesh searches" infringe privacy.


A large group of netizens, for certain purposes, use all their resources to collect as much information as possible about the subject; then they refine the information according to their purposes, and feedback it to the internet to share with others. This completes the first "human flesh iteration."


Then, based on the first human flesh iteration, everyone learns from each other and continues to collect, process, and organize information through cross-repetition, creating the second "human flesh iteration." This continues in a loop, and after numerous relentless iterations, the subject's portrait appears on the paper. If the material that constitutes the "satisfactory portrait" has indeed been verified and at least the main body is a fact, then the "human flesh search" is successful.


It can almost be concluded that as long as enough netizens participate in the "human flesh search," the time is long enough, and everyone's perseverance is strong enough, then anyone can be exposed.


In fact, in a sense, Big Data Mining is just a special "human flesh search" that is automatically completed by machines. The purpose of this search is no longer limited to defaming or praising someone, but has a broader purpose, such as finding the best buyers for product sellers, finding patterns for certain types of data, and finding associations between certain things. In short, as long as the purpose is clear, Big Data Mining will have a place to be used.


Compared with "human flesh searches," netizens are replaced by computers; the information collected by netizens is replaced by massive heterogeneous data in the database; the techniques that netizens use to find various connections between people are replaced by corresponding intelligent algorithms, and the practices that netizens learn from each other are replaced by various synchronous operations.


Each iteration process still proceeds as usual, but the machine has more iterations, faster speed, and each iteration is actually a "learning" process of the machine. The netizens' final "satisfactory portrait" is replaced by temporary mining results. The reason why it is said to be temporary is that for Big Data Mining, there is no end. The results will become more and more accurate, and the level of intelligence will become higher, and users only need to choose satisfactory results according to their standards at any time.


Of course, in addition to similarity, "human flesh searches" and "Big Data Mining" must have many significant differences. For example, machines do not get tired; they collect more and faster data, and the channels for data sources are more extensive. In short, the netizens' "human flesh search" will ultimately lose to the machine's "Big Data Mining."


4.Privacy Protection and Data Mining coexist with "crisis" and "opportunity."



It must be acknowledged that, in the current reality, the "lethality" of big data privacy mining has far exceeded the ability of big data privacy protection. In other words, in the face of big data mining, current humans are a bit overwhelmed. This is indeed an unexpected situation. Since the birth of the Internet, people have spared no effort to leave fragmented information online forever. Although each of these fragments is completely harmless, no one has ever realized or deliberately paid attention to the potential privacy risks.