Strike a balance between governance and freedom
For some companies, restrictions are not the concern – in fact, it’s the opposite. In these cases, IT administrators dial back restrictions on the data lake and allow a free-for-all to users. This may seem ideal to some users, but when expensive queries hog all the computational resources or data becomes corrupted, everyone on the system suffers. Without governance and structure, data lakes quickly become uninhabitable data swamps with lagoons of unsupported tables. The key is to find the right balance between giving users the freedom to use certain tools and the ability to experiment while providing a consistent quality of service to the operational environment.
Align data initiatives with business goals
Early in their Big Data deployments, far too many businesses move quickly to establish data platforms and make technology choices without considering the business strategy along the way. This mentality of “if we build it, they will come” may seem innocent initially – after all, how harmful can it be to build out a data lake? It turns out that if technology choices and business processes are put into place without understanding how the business will take advantage of the underlying system, then there is a good chance the deployed platform won’t meet the needs of the business and will be scrapped in favor of something else. On paper, the solution to this is simple: IT and the business must collaborate and work together to define the requirements for the system prior to implementation. In practice, this is often the most difficult thing to do and requires persistence and strong leadership from both sides to bring the parties together.
Create a data infrastructure with the ability to scale
Most good data lake implementations follow the tried-and-true guidance of deploying on commodity, bare-bones infrastructure. This is fine, until it isn’t. Once these deployments reach dozens of servers and hundreds of terabytes of data with dozens of analytical users, provisioning sandboxes becomes a full-time job – and it shouldn’t be. Two things can help streamline this process:
Containerize the compute environments so that new sandboxes can be deployed with the click of a button.
Decouple the data storage from the compute environment and provide read-only access from the containerized sandboxes to the data.
There are other tools that can provide workarounds for this. This gives analysts flexibility and easy access to data that has integrity, while allowing for the independent scalability of the compute from the storage tiers. The result is lower total cost of ownership and easier overall maintenance.
As small and midsize businesses’ data efforts mature, they run into many new barriers that companies prototyping Big Data initiatives face. This is perfectly natural and a healthy sign of growth. But, by focusing on these five drivers, companies can start realizing the successes of Big Data pilots and driving long-term success and value with data.