Delivering Big Data Advanced Analytics Clusters - Defining Your Requirements (PART 2)

Defining Your Requirements - Part 2


By Darran Cooke, Services Delivery Manager, EMEA, Cloudera



Delivering Big Data Advanced Analytics - Clusters - Defining Your Requirements

Defining Your Requirements - Part 2


In this, the second part of my blog, I would like to provide some guidance around the key areas within the ‘Defining your Requirements’ stage. Defining these requirements early is an important prerequisite and will hopefully ensure that you have done everything possible to ensure a strong starting position for your project to build upon. ‘By failing to prepare, you are preparing to fail’, Benjamin Franklin.

Key focus areas

  • Align your use case to a business outcome. What is it that the business needs? How will your Hadoop cluster enable those insights and deliver business value?
     
  • Understand and agree your timeline, ensure this is updated and communicated regularly and is aligned to your business user and stakeholder expectations.
     
  • Identify and engage ALL key stakeholders early. Receiving approval and sponsorship from this select group early on in the process will save time and effort mid-delivery.
     
  • Understand the source data - Most data analytics platform vendors will use the 'Ingest anything you like' strapline, which is absolutely correct. But there will almost certainly be a need to transform that data in order to exploit it effectively and take advantage of the many toolsets, BI APIs and Apps available.
     
  • Develop a data model. This doesn't need to be complicated, certainly in the early stages. But considering a strategy early on, will certainly reap benefit later as your solution and use case expands. I like to use the Data Lasagne explanation by Mark Samson (PSE, Cloudera).
     
  • Ensuring your delivery teams are set up to deliver end-to-end using a DevOps, continuous delivery methodology will ensure technical debt is kept to a minimum but also means requirement checks and communications across the project are much quicker. It also means that you have a greater chance of meeting business/end user requirements if they are involved in the delivery from the onset. Having input throughout delivery from the following streams, ingestion, transformation and exploitation will in most cases deliver faster and better quality results.
     
  • Training. Advanced analytics within your organisation may still be relatively new, most customers will want to utilise current staff. There is world class training available, identifying your skills gap and providing training in the latest tools that are better suited to the tasks will increase capability and productivity in the long run. Making sure this training is delivered at the right time has huge benefits. Understanding the theory is great but early and regular hands-on use, building and configuring your clusters will cement this knowledge and understanding. It is also worth considering employing, short term, expert consultancy in specialist areas such as securing your clusters, or real time analytics, which will, almost certainly pay off in the long term, reducing project re-work, project slippage and potentially additional support costs. This also provides a great learning and knowledge opportunity for your internal teams, capturing and documenting best practices in these specialised areas.

Many organisations begin their journey into Hadoop with a belief or hope that it will provide them with a cheap alternative to replicate their data warehouse capabilities, and offer an opportunity to retire their legacy infrastructure. There is nothing wrong with that, but limiting or hard wiring your data analytics to only SQL because that’s where you’re comfortable, will inhibit your ability to take advantage of some key tools in the ‘big data swiss army knife’ such as Search and machine learning on Spark. The advanced analytics ecosystem opens a door to a world of opportunity, those taking advantage are making significant advances and generating true additional business value.

The boundaries of what can be achieved are still expanding, and rapidly. We have customers creating use cases with advanced machine learning techniques and integrating with aspects of Artificial Intelligence. Such systems take the more detailed data that can be collected and create predictive systems that can supplant human experts in fields such as image recognition, classification and prediction.

This represents an evolution in how data can be applied for those organisations that face into the challenge and run with it. We are witnessing a wholesale revolution in how some of the largest organisations in the world wield data and explore criteria for generating more value from an asset that they are now able to collect in full fidelity.


Read: Defining Your Requirements - Part 1