As small and midsize companies mature in their Big Data capabilities, they find it increasingly difficult to extract value from their data for two primary reasons:
- Organizational immaturity with regard to change management, based on the findings of data science.
- Scalability limitations slowing the efficiency of the data science team.
This leads to disappointment, as encouraging early prototypes fail to deliver on promises. There are five key drivers to help growing businesses capitalize on the value of their data faster. Companies that want to leverage their early data science success need to embrace these five drivers.
Consolidate data into a single data lake to avoid data sprawl
As companies grow into Big Data maturity, deployments of Hadoop and other Big Data technologies spring up along the way. The initial decentralized approach allows for faster adoption but eventually results in silos of data and technology. This becomes a problem because data is often duplicated across the deployments, resulting in possible compliance issues and a higher overall maintenance cost. Furthermore, having multiple systems that do not interact nicely can hinder and discourage analyses by data scientists and increase the learning curve for anyone looking to analyze their data. More importantly, providing visibility through reports and analytics across these silos is nearly impossible, preventing upper management from having a clear picture of the business. Successful clients have found tremendous value in consolidating the data into a single lake.
Provide users with the appropriate level of access to data
For businesses that have consolidated data into a centralized lake, the next challenge is providing the right level of access to the data. In order for data scientists to perform advanced analytics, they require a few things: access to large amounts of data, the ability to augment existing data with outside data sources, and the ability to model the data using cutting-edge tools and libraries. This is often the exact opposite of what risk-averse IT administrators want to provide, which results in loss of productivity for the data scientists. Data security is an important consideration – especially for clients in financial services or healthcare. But IT policy requires a balance between security and stability. Successful clients have often sidestepped this problem by offering analytical sandboxes, independent of the production system, for the data science community. This allows them to freely experiment and iterate as they perform their work. This also postpones the complex questions around permissioning to a later stage, after business value can be more tangibly established so that managers can make more informed business decisions.