Over the past few years, there has been a trend with big business to gather enormous amounts of data. Where do they get all this information? How do they store it? What do they do with it? How safe is it? Data repositories hold billions of bits of information. Is there even a word for such a high number? Apparently there is. The words ‘exabytes’and ‘zettabytes’ get bandied around when people start discussing the amount of data being held. Can you imagine how much information is held by Google? Every time you do something on your computer, it’s remembered. Next time you surf the internet, little bits of info they have about you keeps popping up. Scary!

statistics

It’s understandable when in laboratory experimentation or scientific discovery that data quality is a primary concern. Data analysis is needed to be able to form a conclusion. With so much information, how do they validate the data? The analysis of data depends on the characteristics of the hardware platform and the software stack which can fundamentally impact data analytics. Some companies are developing state-of-the-art hardware and software to address the problem of big data analytics.

Apart from your activity on a computer, there are innumerable ways that data is gathered. The old method of surveys still exists, and it’s quite laughable when a person says that they don’t want to fill in a survey for privacy reasons! Then they spend a few hours a day purchasing online, playing games, making and receiving calls on their smartphones and use their credit card. Even their employer maintains personal records in some form of data storage. Not so long ago a group of Russian hackers showed just how easy it is to access individual pieces of financial information of more than half a billion people that was held by banking institutions.

Some say there is a ‘revolution’ happening with data. They believe that they can do something with it. Improved statistical and computer systems enable the application of algorithms to analyse the amount of data available. You can gain insights into the way people live and behave by linking sets of data from different sources. This data mining and the use of multivariate data help scientists and researchers in many different fields such as medicine and astronomy. Governments collect and then disseminate data collected from many sources as well as their 5 yearly censuses and use the information to plan roads, transport and provide services to the population.

Target Stores used an algorithm to detect when women fell pregnant by tracking online purchases. They then offered special discount vouchers to those customers. Ingenious or downright Orwellian? There is no doubt that Amazon uses the shopping data from its millions of customers to fill their individual computer screen with specials on products they know will appeal to that buyer. There is no doubt that the collection and analysis of data will only get bigger. The problem is not how much, but how safe is it?