We are witnessing tremendous transformations in enterprise data architectures. The traditional workhorse, relational database management system (RDBMS), is being supplemented by large-scale non-relational stores, such as Apache Hadoop Distributed File System (HDFS), MongoDB, Apache Cassandra, and Apache HBase. While this transformation may seem overwhelming to a few, we see a more fundamental shift on its way, which demands many more changes to modern data architectures.
The current ongoing transformation was mandated by business requirements for the connected world, with the volume, variety, and velocity of data that resulted from ubiquitous connectivity for consumers, democratization of content, and advertising-supported business models. We believe that the next wave will be dictated by richer customer interaction, right-time business insights and operational cost optimization. This is augmented by transformative changes in the underlying infrastructure technology such as on-demand availability of computation, storage & networking in public and private clouds, ubiquitous cheap low-power sensors, and newer use-cases such as Internet of Things (IoT), Deep Learning, and Conversational User Interfaces (CUI).
In order to meet the challenges of these emerging use cases, one must be able to perform high-scale deep analysis and learning functions, near real-time decision making, and adjust quickly to new events or learnings. Therefore, it is a precondition that the modern data architecture needs to bring data analytics out of the back office and merge it with operational business systems.
The modern business imperative is to provide hyper-personalized experiences to consumers, based on the real-time context of that user’s interaction. This context is generated from all the available data about that user, about similar users, as well as external data sources that may have an influence on this user’s experience. Thus, all applications will become data-driven applications that are powered by closed-loop analytics. As a consequence, application developers will have to become data scientists and data scientists must have application development skills.
Systems for closed-loop analytics must face the challenge of supporting the deeper model building and analytics on data that was previously trapped in the back-office, with fast & flexible querying over large data sets, functional transformations and algorithms to support learning, fast handoff of data from one processing sub-system to another, and integration with large-scale analysis services in the cloud or on-premises. The friction of data replication and movement must be minimized.
These fast data analysis systems must interoperate at the object transformation service layer. In this transformation layer, a wide swath of common transformation functions, common query engines, and storage engines must be supported. Since a choice amongst best of breed data processing engines is desired to address different processing requirements, a robust way to support multiple storage layouts and fast data exchange within a distributed data store is required. Therefore, the modern data architecture, at its lowest level, requires the development of a smart storage system that handles the needs of multiple sub-systems running a smart backend for rich interaction and timely decision making.
In the next part of this blog series, we will take a brief tour of data platforms prevalent in the enterprises.