What is Big Data? The Basics – Meaning and Usage

The term Big Data is being increasingly used almost everywhere on the planet – online and offline. And it is not related to computers only. It comes under a blanket term called Information Technology, which is now part of almost all other technologies and fields of studies and businesses. Big Data is not a big deal. The hype surrounding it is a sure pretty big deal to confuse you. This article takes a look at what is Big Data. It also contains an example of how NetFlix used its data, or rather, Big Data, to better serve its clients’ needs.

What is Big Data

What is Big Data

The data lying in the servers of your company was just data until yesterday – sorted and filed. Suddenly, the slang Big Data got popular, and now the data in your company is Big Data. The term covers each and every piece of data your organization has stored till now. It includes data stored in clouds and even the URLs that you bookmarked. Your company might not have digitized all the data. You may not have structured all the data already. But then, all the digital, papers, structured and non-structured data with your company is now Big Data.

In short, all the data – whether or not categorized – present in your servers is collectively called BIG DATA. All this data can be used to get different results using different types of analysis. It is not necessary that all analysis use all the data. The different analysis uses different parts of the BIG DATA to produce the results and predictions necessary.

Big Data is essentially the data that you analyze for results that you can use for predictions and other uses. When using the term Big Data, suddenly your company or organization is working with top level Information technology to deduce different types of results using the same data that you stored intentionally or unintentionally over the years.

How big is Big Data

Essentially, all the data combined is Big Data, but many researchers agree that Big Data – as such – cannot be manipulated using normal spreadsheets and regular tools of database management. They need special analysis tools like Hadoop (we’ll study this in a separate post) so that all the data can be analyzed at one go (may include iterations of analysis).

Contrary to the above, though I am not an expert on the subject, I would say that data with any organization – big or small, organized or unorganized – is Big Data for that organization and that the organization may choose its own tools to analyze the data.

Normally, for analyzing data, people used to create different data sets based on one or more common fields so that analysis becomes easy. In case of Big Data, there is no need to create subsets for analyzing it. We now have tools that can analyze data irrespective of how huge it is. Probably, these tools themselves categorize the data even as they are analyzing it.

I find it important to mention two sentences from the book “Big Data” by Jimmy Guterman:

Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system.”

-and-

“For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”

So you see that both volume and analysis are an important part of Big Data.

Read: What is Data Mining?

Big Data Concepts

This is another point where most people don’t agree. Some experts say that the Big Data Concepts are three V’s:

  1. Volume
  2. Velocity
  3. Variety

Some others add few more V’s to the concept:

  1. Visualization
  2. Veracity (Reliability)
  3. Variability and
  4. Value

I will cover concepts of Big Data in a separate article as this post is already getting big. In my opinion, the first three V’s are enough to explain the concept of Big Data.

Big Data Example – How NetFlix used it to fix its problems

Towards 2008, there was an outage at NetFlix due to which many customers were left in the dark. While some could still access the streaming services, most of them could not. Some customers managed to get their rented DVDs whereas others failed. A blog post on Wall Street Journal says Netflix had just started on-demand-streaming.

The outage made the management think about the possible future problems and the hence; it turned to Big Data. It analyzed high traffic areas, susceptible points, and network throughput, etc. using that data and worked on it to lower the downtime if a future problem arises as it went global. Here is the link to the Wall Street Journal Blog, if you wish to check out the examples of Big Data.

The above summarizes what is Big Data in a layman’s language. You can call it a very basic introduction. I plan to write few more articles on associated factors such as – Concepts, Analysis, Tools, and uses of Big Data, Big Data 3 V’s, etc. Meanwhile, if you would like to add anything to the above, please comment and share with us.

Posted by on , in Category General with Tags

Arun Kumar is a Microsoft MVP alumnus, obsessed with technology, especially the Internet. He deals with the multimedia content needs of training and corporate houses. Follow him on Twitter @PowercutIN