As mentioned in previous posts, Ampool is a memory-centric, data aware, distributed store which extends the proven open source Apache Geode project to support multiple types of application workloads. Apache Geode is a distributed in-memory key-value store which primarily supports the low latency transactional workloads using the data storage abstraction called “Region”. A Region is essentially a distributed hash map. Application specific domain objects are serialized and stored as keys & values. There are multiple types of regions primarily based on the system properties such as if data stored in the region is partitioned or fully replicated, is it persistent, does Region support overflow of values etc. Check out Geode Region Types for more details.
Ampool Active Data Store (ADS) extends Geode platform by introducing additional data storage abstractions (or data structures) so that multiple types of applications can make use of them storing and retrieving the data efficiently and also collaborate effectively. Traditionally applications cache the data in their local memory/heap from the underlying data store in various types of suitable data structures. Ampool eliminates the burden of such explicit memory management from applications and virtually provides unlimited memory space using its memory-centric Active Data Store. Applications can directly interact and manipulate the data in the data store using suitable data structures at memory-speed.
As of today, Ampool ADS adds two more data abstractions namely MTable & FTable. These abstractions primarily extend Geode’s functionality to support machine learning & analytics workloads along with OLTP.
- MTable stands for mutable table. In subsequent posts we will learn more about MTable but at the core, it is a distributed tabular data structure that is partitioned based on a primary key. MTable allows quick lookup & update of individual table rows based on row-key but at the same time also supports very efficient scan operations and range queries based on row keys.
- FTable stands for flow table also sometimes we refer to it as facts table as it is useful for storing immutable event log, metrics, or audit trail data referred to as “facts data” in the data warehousing context. FTable primarily supports “Append” and “Scan” operations and internally keeps records sorted based on the insertion time and partition key (which is typically one of the table columns).
In Ampool we believe that the table abstraction is much more efficient for most common operations, rather than raw key-value maps because it enables support for simple and complex typed column values, and also allows efficient projection, filter, and computation pushdowns for efficient data access and manipulation.
Both MTable and FTable provide persistence for data stored in memory as well as ability to extend it using persistent storage when data is too big to fit in available memory. Data can be evicted to disk-based storage based on configured eviction policies while data access operations transparently fetch data irrespective of where it is stored. Thus, it avoids the need for applications to have two different stores for short term vs long term data storage.
Embedding Apache Geode, Ampool ADS provides HTAP (Hybrid Transactional and Analytical Processing) capabilities. We purposely separate MTable and FTable as two separate types of tables for efficiently storing and accessing large data used in the context of warehousing resides in the Fact table and is primarily immutable. For dimension data, which is relatively smaller and often mutable, can be efficiently stored and accessed through MTable. As it changes more frequently, dimension data require periodic imports in the warehouse and adds to overall latency to the analytical processing. The ability of MTable for continuous in-place updates can significantly lower the overall latency for real-time analytics. Also, large facts table data stored in FTable spreads across multiple tiers of storage such as hot data in memory vs warm/cold data in SSD, LocalDisk or long term archival storage like S3/HDFS. Keeping data immutable avoids the costly process of compaction and re-write on the disk storage and thus enable effective usage of SSDs for faster data access as the next tier to data stored in memory.
Through connector framework, both MTable and FTable support multiple compute and streaming analytics engines on top such as Apache Spark, Apache Hive, Apache Apex as well as ability to ingest data from distributed message queue platforms such as Apache Kafka. Ampool ADS also supports native Java interface for these table abstractions so applications can directly interact with Ampool or can easily write connectors for external systems to make use of Ampool ADS for efficient storage, retrieval and manipulation of data.
In subsequent posts, we will provide details of MTable and FTable including various features, configuration options, and policies.