Polybase bridges the relational-Big Data gap
To me Big Data sounds horrible and I suspect it really is. It conjures up an image of a tsunami of files, of all kinds of sizes, shapes and forms, washing over the landscape and swallowing up the environment we’d spent years putting in order. Big Data includes everything from massive video files, through social media right down to miniscule temperature readings sent at great frequency by sensors on a machine.
Making sense of Big Data is a daunting challenge. Many people will see it as an opportunity. That’s what they’ll say in public anyway, but there’s a shortage of people who have taken up the challenge to be data scientists, even though the rewards are spectacular. Privately, everyone is as nervous about Big Data as they are about the consequences of one of the creators of all this new information, The Internet of Things.
In truth, the promised benefits of Big Data haven’t arrived yet but that’s possibly because it’s still early days. Usually when a new technology has hit a plateau, it needs a booster to take it to the next level.
That boost could come from SQL Server 2016. Hang on though, are relational databases, with their strictly regimented organisation, right for the freeform jumble that is typical of a Big Data project? Until now, the nearest compromise was to use NoSQL databases which could accommodate more diversity.
However, Big Data projects use systems like Microsoft’s HDInsight or Hadoop and they end up being integrated with existing data from relational databases or data warehouses.
This is where a new feature in SQL Server 2016, called PolyBase, makes all the difference. PolyBase acts as a bridge between SQL Server’s well disciplined relational and data warehousing systems and the anarchic jumble of databases like Hadoop. PolyBase isn’t new, but its level of effectiveness is. Once confined to the SQL Server Parallel Data Warehouse, it was only used by the largest of organisations. The inclusion of PolyBase into the main SQL Server 2016 line-up opens up this Big Data bridge for all SQL Server 2016 installations.
PolyBase puts you back in control. It lets you use your existing SQL Server tools like SSMS and T-SQL to interrogate Big Data stores. It opens doors into previously inaccessible Hadoop clusters and even lets you write queries that join this semi-structured Big Data with tables in SQL Server relational databases. It can retrieve external data from external sources such as, say, the Cloudera and Hortonworks versions of Hadoop and Azure Blob Storage. Microsoft says it will add data sources in the future.
Using PolyBase is like using a linked server. PolyBase is transparent to the application and the external data sources are incorporated into the database schema.
The bottom line is that Big Data becomes conquerable, whereas it did threaten to get out of hand. The emergency service that comes to your rescue is PolyBase and it will arrive in a vehicle called Microsoft SQL Server 2016.