The Single Biggest Problem in Big Data Analytics
The single biggest problem in Big Data Analytics is neither data volume, nor data velocity, nor data variety, nor complexity of data analytics tools, but the need for strong programming skills combined with deep domain knowledge.
Strong programming skills are needed to perform Big Data processing in a distributed environment with multiple networked servers. Such skills are needed even for basic exploration tasks like data browsing, filtering, sorting, aggregating, computing simple formulas, etc.
Deep domain knowledge is needed to understand what data may contain Big Value and what formulas are applicable to extract valuable insights. Such skills become even more important when initially it’s not clear how insights may look like and what are the extraction formulas. It’s called Data Science and requires quick ad-hoc experiments on large amounts of data.
Before Big Data era, people with deep domain knowledge simply used spreadsheet applications like Excel for the purposes of data analysis, data science, etc.
Nowadays, data has become Big Data, however Excel is not capable of handling millions and billions of records. So, the people with deep domain knowledge hire other people with programming skills to work together on data analytics. Cross-functional team quickly turns data analytics into unpredictable project activity with a lot of planning, communication, and tasks. The process is usually full of fun and is called “agile”, but it takes so much time and money. As a result, Big Data usually needs Big Budget that is not necessarily turned into Big Value.
Think of it: if you are a domain expert you cannot even take a look at your Big Data without hiring people with programming skills! It looks like dictatorship of programmers.
At Nezaboodka, we would like to give domain experts the freedom of doing Big Data Analytics themselves - we are reinventing data analytics tools alongside with database management system for Big Data era.