Big Data Definition
Big Data refers to a huge volume of data that cannot be stored or processed using the traditional approach within the given time frame.
What are the characteristics of Big Data?
Here are the three most important characteristics of Big Data
- Volume- It refers to the amount of data that is getting generated.
- Velocity- It refers to the speed at which this data is generated.
- Variety- It refers to the different types of data that is getting generated.
How huge this data needs to be?
1) it is a common misconception that Big Data refers to data that can be measured in GB or TB or anything larger than this size. A small amount of data can also be referred to as Big Data. It depends on the context in which the term is being used.
2) For example, a document of size 100MB cannot be attached to an e-mail as it does not support an attachment of this size. So this attachment of size 100MB can be referred to as Big Data in the context of e-mail.
3) Take another example. You have 10TB of image file upon which certain processing needs to be done, like resizing or enhancing the images, within a given time frame. If you use the traditional system to perform this task you will not be able to finish it in given time frame. The computing resources of the traditional system would not be sufficient to accomplish it on time. Here, this 10TB of data can be referred to as Big Data.
4) On a regular basis social media websites like Facebook, Twitter, Youtube etc receives huge volumes of data. As the number of users keep growing on these sites it becomes difficult to store and process this data. But this data needs to be processed in a short span of time as it contains valuable information. Again, this is not possible within the given time frame using the traditional system of computing. This huge volume of data can also be termed as Big Data.
5) Another real life example of this can be aircraft transmitting data to the air control towers located at the airports. This data is used to track and monitor the progress of the flight on a real time basis. As there are multiple aircraft transmitting data simultaneously, a huge volume of data gets accumulated in a short span of time. This data can’t be managed with the traditional computing system within the given time. Hence, this data can be termed as Big Data.
This problem of Big Data Management can be solved using HADOOP instead of the Traditional computing system.
Big Data Challenges
There are two main challenges associated with Big Data:
1) Storage and Management
The first challenge of Big Data is the storage and management of this huge volume of data efficiently.
2) Extraction in Given Time
The second challenge associated with Big Data is the processing and extraction of this huge volume of data within the given time frame.
These are the two main challenges of Big Data that led to the development of the HADOOP framework.