What is Big Data? The Basics – Meaning and Usage

The term Big Data is being increasingly used almost everywhere on the planet – online and offline. And it is not related to computers only. It comes under a blanket term called Information Technology, which is now part of almost all other technologies and fields of studies and businesses. Big Data is not a big deal. The hype surrounding it is a sure pretty big deal to confuse you. This article takes a look at what is Big Data. It also contains an example of how NetFlix used its data, or rather, Big Data, to better serve its clients’ needs.

What is Big Data

What is Big Data

The data lying in the servers of your company was just data until yesterday – sorted and filed. Suddenly, the slang Big Data got popular, and now the data in your company is Big Data. The term covers each and every piece of data your organization has stored until now. It includes data stored in clouds and even the URLs that you bookmarked. Your company might not have digitized all the data. You may not have structured all the data already. But then, all the digital, papers, structured and non-structured data with your company is now Big Data.

In short, all the data – whether or not categorized – present in your servers are collectively called BIG DATA. All this data can be used to get different results using different types of analysis. It is not necessary that all analysis use all the data. The different analysis uses different parts of the BIG DATA to produce the results and predictions necessary.

Big Data is essentially the data that you analyze for results that you can use for predictions and other uses. When using the term Big Data, suddenly your company or organization is working with top-level Information technology to deduce different types of results using the same data that you stored intentionally or unintentionally over the years.

How big is Big Data

Essentially, all the data combined is Big Data, but many researchers agree that Big Data – as such – cannot be manipulated using normal spreadsheets and regular tools of database management. They need special analysis tools like Hadoop (we’ll study this in a separate post) so that all the data can be analyzed at one go (may include iterations of analysis).

Contrary to the above, though I am not an expert on the subject, I would say that data with any organization – big or small, organized or unorganized – is Big Data for that organization and that the organization may choose its own tools to analyze the data.

Normally, for analyzing data, people used to create different data sets based on one or more common fields so that analysis becomes easy. In the case of Big Data, there is no need to create subsets for analyzing it. We now have tools that can analyze data irrespective of how huge it is. Probably, these tools themselves categorize the data even as they are analyzing it.

I find it important to mention two sentences from the book “Big Data” by Jimmy Guterman:

Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system.”


“For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”

So you see that both volume and analysis are an important part of Big Data.

Read: What is Data Mining?

Big Data Concepts

This is another point where most people don’t agree. Some experts say that the Big Data Concepts are three V’s:

  1. Volume
  2. Velocity
  3. Variety

Some others add few more V’s to the concept:

  1. Visualization
  2. Veracity (Reliability)
  3. Variability and
  4. Value

I will cover concepts of Big Data in a separate article as this post is already getting big. In my opinion, the first three V’s are enough to explain the concept of Big Data.

Big Data Example – How NetFlix used it to fix its problems

Towards 2008, there was an outage at NetFlix due to which many customers were left in the dark. While some could still access the streaming services, most of them could not. Some customers managed to get their rented DVDs whereas others failed. A blog post on the Wall Street Journal says Netflix had just started on-demand-streaming.

The outage made the management think about the possible future problems and hence; it turned to Big Data. It analyzed high traffic areas, susceptible points, and network throughput, etc. using that data and worked on it to lower the downtime if a future problem arises as it went global. Here is the link to the Wall Street Journal Blog, if you wish to check out the examples of Big Data.

The above summarizes what is Big Data in a layman’s language. You can call it a very basic introduction. I plan to write a few more articles on associated factors such as – Concepts, Analysis, Tools, and uses of Big Data, Big Data 3 V’s, etc. Meanwhile, if you would like to add anything to the above, please comment and share with us.

Read next: What is Web Scraping?

Posted by on , in Category General with Tags
Arun Kumar is a Microsoft MVP alumnus, obsessed with technology, especially the Internet. He deals with the multimedia content needs of training and corporate houses. Follow him on Twitter @PowercutIN


  1. roraniel

    The company I used to own wrote a GUI for a “BIG DATA” analysis program for the state DOT to conduct traffic studies. The DOT purchased data subscriptions that were automatically imported into our GUI and then the DOT could organize the data into things like travel distance to an intersection, traffic volumes at all hours, speed, demographics of vehicle passengers, as well as dozens of other criteria. It gave the DOT much more useful information than the standard axle counters stretched across the road gave them.

  2. DataH

    Arun, very nice Big Data article. When considering a big data strategy, I think it’s worth mentioning HPCC Systems from LexisNexis. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems and can help companies derive actionable insights from their data.

    HPCC Systems provides proven solutions to handle what are now called Big Data problems, and have been doing so for more than a decade. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at http://hpccsystems.com

  3. Arun Kumar

    I am reading about the different options available for data analysis. The link you gave is a useful resource. Thank you 🙂

  4. Arun Kumar

    Do you mean using a customized option is better instead of going for ready-made products like Hadoop etc?

  5. DataH

    You’re welcome! You might also want to check out their free online ECL training at http://learn.lexisnexis.com/hpcc

  6. Detailed archiving of your info; be it family albums or music files will be an ongoing task but well worthwhile.
    With a few keystrokes you can automatically catalog existing “MPG” video, “MP3” audio, “JPG” pictures and “TXT” email records & notes etc.
    Moments later you will be randomly sampling your data treasures.

    It takes some time for Your personal database to become large enough to make searching interesting. If your memory is short then not so long.

    This app plows through text files at 20,000,000 CPS and beyond on a 8 year old LapTop..

    I have completed over 4 Dozen Telephone and CableSystem billing conversions (ETL) 95% of that data came
    in text files, a small percent was packed integer and real numbers etc. The Export data included toll files, work orders, customer details etc. These data were no problem for this app.

    Many of these IT jobs were for some of the largest companies in the Western Canada and US.
    The so called “Big Data” isn’t that BIG for today’s computers. I have more personal data than all the ETLs combined!!!

    To keep You in touch with the massive amounts of DATA you’ll collect; the app can randomly sample Video or audio segments as easily as family pictures.
    Text data can be displayed “in context” or “matching lines only” along with match counts, line counts and elapsed time.

    Without a Random option; computer resources go unused and your data mining tools will fall short.
    Video playback options such as Fast Forward, Slow Motion and Large font captioning mixed with Video segments are a few of the
    main features. There is no more useful app than this.

    See the thread “nobody shares knowledge better than this” for all the details

  7. Balamurugan

    Nice Explanation

Leave a Reply

Your email address will not be published. Required fields are marked *

5 + 7 =