I was reading an article on the information management.com website (http://www.information-management.com/ ) by Yves de Montcheuil who is the vice president of marketing at Talend (https://www.talend.com/ ). He wrote about how there will be a convergence of the domain of data and soon we will ignore the current separations within the domain of data management and analytics. We will move to looking at data as one composite whole. We’ll soon get into the habit of not looking at data as BigData versus transactional data. Our enterprise data warehousing systems are going to become good enough to be able to work on data from all sources and of all types.
Does this sound confusing? You know that in analytics setups we generally have teams which work on ERP or Source system (internal to the organisation) transactional data- which has been around for a few decades now – and other specific/ specialist teams which work on BigData .
What is BigData? BigData can be taken to be data which has the following characteristics
- Volume- BigData implies enormous volumes of data it could be data generated by employees or by machines networks systems like social media etc.
- Variety -the data could be structured or unstructured
- Velocity which is the rate at which the data is generated
There are also some other dimensions that are related to BigData – for example, Veracity which refers to bias is noise and abnormalities in the data; Validity which is understanding if the data is correct and accurate for the intended use; volatility which refers to how long the data is valid and how long it should be stored.
BigData falls into five categories:
- Web and social media data from social media such as Facebook, Twitter, LinkedIn, and blogs.
- Machine-to-machine data from sensors, meters, and other devices
- Big transaction data like billing records and customer purchase data
- Biometric data includes fingerprints, genetics, handwriting, retinal scans.
- Human-generated data includes vast quantities of unstructured and semi-structured data such as voice recordings, email, paper documents, surveys etc.
Thus, in the analytics setup we see separate teams which work on the vintage, well entrenched analytics practise and data from a source system/s which are maintained by the organisation. These traditional data are often data on which there is clear regulations and there are governance frameworks on data. Industries are insurance companies, banks , pharmaceutical companies etc. . These Governance frameworks and governing bodies (like the RBI, IRDa, FDA) tell you and guide you on how long you need to retrain the data, what type of a basic reports and analytics you need to do at the data etc. Thus, the minimum benchmarks and expectations are set.
This type of structure is something which is not often seen in the domain of BigData. There is virtually not much regulation and there are very few guidelines as these are also evolving as the BigData practise is evolving.. However, as maturity arrives in this domain of BigData, we will see clear convergence of this type of data with our existing, internal system data and, in some time , I am sure we’re going to see the creation of processes which will allow BigData and our existing transactional – “business as usual” data- merging to become what has been called total data. In all probability , the largest problem that we are going to see and perhaps the highest paying jobs that we’re going to see in the coming years will pertain to how we can create the total data architecture . This will create a more robust system of information and business business decision-making.
We live in exciting times … the future seems to hold a lot of promise for people who can explore and integrate across barriers !!