What is Data?

Factual information (as measurements or statistics) used as a basis for reasoning,discussion, or calculation.

Information output by a sensing device or organ that includes both useful and irrelevant or redundant information must be processed to be meaningful.

Information in numerical form that can be digitally transmitted or processed.

Simply we can say, data is Factual Information or Data is raw facts before it has been processed. Data is our every day life, we deal with it every day basis.

What is Big Data?

Big data is high volume, high velocity and high-variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making-Gartner.

Big data refers to large data sets that are challenging to store, search, share, visualize  and analyze.

Big data is data that is going to give a certain contextual information for us to make a decision but it is not trivial in size.

Big data — Big data technology is a new generation of technology and architectures designed to extract value economically  from very large volume of a wide variety of data by enabling high velocity capture, discovery and analysis.

Big data is data that exceeds the processing capacity of conventional database systems.  The data is too big, moves too fast and does not fit the structures of existing data base architecture.

It is enormous amount of data .

When we going to deal with enourmous amount of data , these are the two problems that we need to sort first.

So we going to deal with problem by problem here.

Data Storage-needs to increase

Data Processing-needs to Have more information

So  big data is a problem domain. More data usually beats better algorithms

Good news is that Big Data is here

Bad news is that we are struggling to store and analyse it.

Data Sources of Big Data

There are majorly 4 sources of data for big data:

1) Social Media- social media is one of the key platform that is trying to pull information from us by providing such key environment for the user to put information.

We have popular social media websites, everybody knows-Facebook, Twitter, Instagrams, Pintrace etc. everything is there and we push information to it.

2) Cloud

3) Devices

4) IOT

The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

Attributes of Big Data

Volume(size)- Size, is one of the criteria to term data as big data but size itself is a subjective criteria which depends upon infrastructure that u choose to handle the data to process and compute.

Variety(Complexity)- the data type in other words or called formats , variety , variety of data is important.

Relational Data (Tables/Transaction/Legacy Data)

Text Data (Web)

Semi-structured Data (XML)

Graph Data

Social Network, Semantic Web (RDF), …

Streaming Data

You can only scan the data once

A single application can be generating/collecting many types of data 

Big Public Data (online, weather, finance, etc)  


Data is begin generated fast and need to be processed fast

Online Data Analytics

Late decisions è missing opportunities


E-Promotions: Based on your current location, your purchase history, what you like è send promotions right now for store next to you

Healthcare monitoring: sensors monitoring your activities and body  è any abnormal measurements require immediate reaction


Trustworthiness of the data

Data in Doubt

