Venture capitalist Marc Andreessen famously remarked that ‘software is eating the world.’ Today the same can be said about big data. Without tapping into proprietary data and open data, businesses will fail to differentiate in a world of savvy start-ups where there is an Uber for X, Y and everything else.
Instead of developing good software, businesses will now need applications that create a holistic view of customers and their contexts. Rich streams of data (social media, connected devices, calendars, clickstreams) will need to be fed into algorithms for predictive analytics and personalisation. Being ‘data-driven’ will no longer be a differentiator but a basic necessity.
For some, that sets alarm bells ringing. Old fears about machines taking human jobs resurface. However, the best companies will combine human talent and tech: if you’re a data scientist, big data may eat the world, but it won’t eat you any time soon.
A rare and expensive breed
Data scientists are at the centre of the big data conversation. They are accomplished technical specialists capable of using an array of tools to interrogate data. They answer the questions businesses ask of their data, and the ones they didn’t even know they should be asking. Yet, the shortage of data talent is evident in the statistics – CrowdFlower’s 2016 data science reportfound that 83 per cent of respondents said there weren’t enough data scientists to go around. Demand for data scientists is sky high.
Why? Some of the greatest data storage and processing technologies of recent years have been the product of a small coterie of the best engineering brains. For example, Hadoop’s seeds were sown by a group of engineers at Google but grown and open sourced at Yahoo. Spark was developed at UC Berkeley’s AMPLab. Although these innovations spread like wildfire in the open source tech community, there is a shortage of talent with the analytical experience to understand and deploy these complex technologies effectively.
As the law of supply and demand dictates, this makes data scientists expensive. The median salary is $119,000, nearly double the average developer salary of $65,000, as reported by Glassdoor – as more demand than supply for tech talent is creating a competitive recruiting environment. Of course these figures depend on factors such as location and seniority, but it’s clear that good talent comes at a premium, and a great place to work with stretching professional challenges will be crucial to hiring and keeping people too.
Decentralising and democratising data
If spending these amounts on technical talent, and even more on technical infrastructure, then it’s important to be getting the most out of those considerable investments. In practice, this question is very similar to another. Namely: How do we embed data into the DNA of an organisation? Relying on a small group of brains to re-orient a company is unlikely to end in success. Better to move away from centralising data towards a more collaborative and decentralised approach.
Data analysis is the combination of humans asking the right questions along with having the tools to answer them. So more people asking means more ‘right questions’ and, hopefully, more ‘right answers’. Data scientists should work closely with different types of employees – including senior managers, analysts, customer services and marketing managers – to answer business related questions. A data-driven business should cater for the needs of both data scientists who want to operate at a lower level and managers who want a higher level overview. Data analytics should act as a glue that binds different departments and seniorities together, positively blurring the lines between what is technical and what is business-focused. Cloud computing has catalysed this process by unburdening IT teams from having to deploy and maintain applications.
The missing piece is visionary leadership. McKinsey predicts that by 2018 there will be a shortage of 140,000 to 190,000 people with analytical experience and a staggering 1.5 million shortage of managers with adequate skills to make critical big data decisions. Hiring a couple of PhDs will reap a few rewards, but without direction and support from the top, the highly paid data scientists may end up being glorified (and overpaid) analysts, who make a few SQL queries followed by the odd Tableau visualisation. Management needs to clearly define the key business questions that need to be answered and create roadmaps for the medium to long term – showing what software needs to be built or bought, and who needs to be hired along the way.
Getting that management and environment factor right will be crucial to getting the best out of and retaining data scientists. Along with the right technology, it also means maximizing the potential of the existing data science team, rather than having to go through the difficult, expensive process of expanding it.
But will software eventually eat data scientists?
Will there be a time when data scientists are simply no longer needed? A lot of the legwork has already been automated. The big data pipeline, which has been historically fragmented, is consolidating. More and more software companies are offering all-in-one platforms to collect, pre-process and store data; and write and deploy algorithms. Many applications even allow users to train and run a complex machine learning model without writing a line of code.
However, human intuition is still regarded as a necessary component in designing machine learning algorithms, particularly in ‘feature engineering’, which is extracting key variables from a set of data to use in an algorithm. Let’s say we are predicting customer churn. The important feature may not be the dates when a customer joins and unsubscribes but the time span in between. Tasks like these (which some put down to the imprecise notion of creativity) are difficult for a machine to do.
Data scientists need not worry; you won’t be replaced any time soon. But, like most activities in the workplace, data analysis will inevitably be replaced by software in the age of artificial intelligence. In the meantime though, data scientists will only get more important for a while yet.