The IoT needs a new kind of database

The IoT needs a new kind of database

Gantner Instruments uses CrateDB ability to take synchronized and de-centralized measurements from hundreds of thousands of sensors, feed them into a database and extract that data. Image Credit: Gantner Instruments

Databases need to be reinvented for the Internet of Things (IoT) era, or more specifically… they need to be reengineered for the machine data era, because this is a world where the very nature of data is different… but why is this so?

A new breed of database and data services is emerging. It is one built with the specific needs of the IoT in mind.

How data used to be

Back in the day, data used to come in comparatively regular, modular almost predictable chunks. Databases would expect their daily or often nightly feeds of grain to be fed in regularly segmented portions designed to fit into particular [schema] columns, slots and repositories.

Today we call that ‘structured’ data.

How data is now

Today we know that structured data is a luxury. More commonly we find unstructured and semi-structured data flowing in real time through continuous connected software application structures that demand not just standard data analytics… but fabulous ‘query versatility’ of different kinds. Top all that with the need to have the option to perform just a little query analytics, or masses of it at web-scale.

Today we call that ‘OMG, big data headache’ data.

“The growth of machine data and the opportunities that businesses have to capitalize on it are outstripping the ability of their data management infrastructure to act on it,” said Jason Stamper, analyst, data platforms and analytics at 451 Research.

As software vendors attempt to provide an answer to the new über-demanding data landscape, we see firms like Crate.io come forward. The company has just shifted (crate pun intended, sorry) CrateDB 1.0, an open source SQL database that enables real-time analytics for machine data applications.

CrateDB makes machine data applications that were previously only possible using NoSQL solutions available to mainstream SQL developers.

As 451 Research’s Stamper points out, CrateDB’s power lies in its ability to enable users to collect and analyze vast amounts of data in real-time, using SQL commands they already know.

NoSQL vs. SQL for dummies

Essentially then, SQL databases are typically known as Relational Databases (RDBMS); whereas NoSQL databases are typically known as non-relational or distributed databases.

As nicely explained on TheGeekStuff here, “SQL databases represent data in form of tables which consists of n number of rows of data whereas NoSQL databases are the collection of key-value pair, documents, graph databases or wide-column stores which do not have standard schema definitions which it needs to adhered to.”

Downloaded more than one million times since its introduction in 2014, CrateDB combines the familiarity of SQL with the versatility of search and the ease of scalability of containers.

It provides an alternative to existing analytic data stores, including Splunk.

“The mission-criticality of our industrial sensor and data acquisition devices cannot be overstated. Our customers in the automotive, energy, aerospace and civil engineering segment rely on our ability to take synchronized and decentralized measurements from hundreds of thousands of sensors, feed them into a database and extract that data for instant visibility of power, temperature, pressure, speed and torque. Based on the real-time aggregated metadata they make their decisions. CrateDB is the only database that gives us the speed, scalability and ease of use that our teams, customers and applications require,” stated Juergen Sutterlueti, head of energy segment, Gantner Instruments.

Do we really need new databases?

So do we really need new CrateDB’s kind of approach? The firm asserts that it has produced a special combination of SQL and search technology for search with machine learning and predictive analytics capabilities.

Essentially, it’s about being able to run analytics queries on time series, full text, geospatial and other structured and unstructured data without having to use different database engines to do so.

This approach is surfacing in line with the need for more modularized, containerized, service-based, componentized computing the typifies much of the current era… but expect a new disruption in approximately five year’s time too, because that’s just what happens.