Despite the fact that Big Data is fastest becoming one of the buzz words in the data industry I still wonder if we yet know exactly what it is. Is it anything more than just a concept, an umbrella topic if you like, to attempt to signal that something new is beginning to emerge in our industry? But, is big data really something new? Is the buzz justified, do we really need to reinvent our approaches, our practices and the skills we engender in those coming up through data careers?
I’d argue that some, if not many, data professionals already know a thing or two about big data. Well over a decade ago I was the architect leading a project to build a data warehouse to handle call traffic and billing data for one of Australia’s biggest telecommunications companies. Back then the volume of data we were working with was hardly small and I’d suggest that those that the people who work with that warehouse, and the BI solutions in enables, today are facing even larger data volumes thanks to the proliferation of mobile phones and other devices. We found ways of handling the challenges that the volume of data posed and more than a few of these approaches would still work today, even without the additional leg up we now get from increased computing power.
But what of the other characteristics of big data? It’s not just volume that’s the challenge with big data, but the velocity too. I’ve heard some commentators discussing that it is the rate at which data arrives which is in fact the biggest issue that comes with big data. But we’ve been dealing with high velocity data for a while now as well. Complex event processing has been available in major database products for at least a few years and people working with any form of operational technology will be used to data flowing in from sensors and other protection or control devices at millisecond intervals.
So I believe that we can leverage much of what we already know and practice as data professionals to start to address the volume and velocity aspects of [structured] big data. It’s the complexity and variety aspects of big data that I think will give us the real problems we need to deal with. The problem of making, or taking, the correct meaning from unstructured data, especially that coming from outside our organisations, such as sentiment analysis from social media, and then somehow find a way to effectively integrate it with our structured data sets is where I believe we’ll find our headaches. It’s here that we’ll need innovative new tools and techniques, but even then I think they’ll rely on key areas and practices which at least some of us have worked with for a while. Metadata will play a big role here to establish the right context and data mining may well make an appearance as well.
So, in my opinion at least, we already know a thing or two about big data. Let’s not re-invent the wheel, but rather try to build on what we’ve learned in the past. It’s a novel idea I know, and perhaps not one that IT professionals have much of a proven track record with, but hopefully this type of approach might shorten the time to value for those making early investments in and around big data.