In today’s world, everyone use smartphones, laptops, desktops and consume a lot of data. for example, My data consumption per day exceeds 2 GB. In this data, I access my social media accounts, play a lot of multiplayer games, watch movies, TV series, and a lot more. Now imagine if my consumption of data for a single person is 2 GB per day more or less (mostly more), then in a month it is 60 GB give or take, in a year it is 720 GB for a single person who use 2 GB data per day. Here, I am talking about my data consumption. but people usually use less data on daily basis. According to a article, in India only, the average consumption of data per user per month is 11 GB.
If we talk about the number of internet users in India, it is around 560 Million.
If we try to calculate the amount of data that only Indians use is really huge. Now, what exactly we want from the provider of the streaming services or social media platforms. we want that whatever we do, it has to be remembered (stored) so that we can actually access it after sometimes like our number of followers, social media posts etc.
The data generated by a single user assuming the user is me, is actually pretty huge and my requirement is to use the data in a manner that i will get something useful out of it. Our main requirement is to store the data so that analysis can be performed over it. So Big data is actually a problem.
Big data in various fields
Big data is not only the problem for social medias, search engines etc. The data generated in other fields, when properly analyzed, yield a significance amount of useful information which can be processed for something useful.
Other fields are-
Healthcare- We usually buy fitness bands, we use mobile apps to keep track of our workout, sleep time, how much calorie we burn everyday etc. Here we also need to store the data and analyze it. according to a article, apple with the help of Stanford researchers, is trying to use its apple watch to detect atrial fibrillation (a medical condition) which is responsible for the death of approximately 1,30,000 Americans every year. This is just one example.
Banking- According to a article, the amount of data per year generated in the banking and finance sector is way beyond my imagination. Every credit card transaction, message, every online transaction we make sums up to 2.5 quintillion bytes of data everyday worldwide(equivalent to somewhat 2,500,000 TB of data). which is actually a really big amount of data generated.
Big data is is a problem the problems that comes with it are volume and velocity. It is really difficult to store a lot of data in the any kind of storage device which may lead to many other problems like corrupted data or many more. reading and writing a really large sized data in to the storage device is also really slow (velocity). So what is the solution to this problem ?
Solution-(Brief about Distributed Storage System)
The solution that we generally use to tackle the above mentioned problems is Distributed Storage Systems. This is a concept that we use to tackle the above mentioned problems i.e. Velocity and Volume. There are several other problem which are not mentioned in the article. Here, if we talk about What distributed storage system is, then it is actually a concept where the big chunks of data is split into smaller blocks and stored in various systems. this process takes place in parallel manner i.e. the blocks are stored in the storage in a parallel way.
This way we can have a lot of storage available as a single unit and since this follows parallelism, thus the speed for writing and retrieving data from the topology also increases.
There are various tools available in the market to deal with big data — Some of them are HADOOP, Apache spark, Apache Storm, Ceph, Hydra, Google Big query etc.