Continuing from the previous post, we now introduce MTable, one of the mechanisms to store/Query/Scan data from Ampool’s Active Data Store (ADS). MTable stands for mutable table. In subsequent posts we will learn more about MTable but at the core, it is a distributed tabular data structure that is partitioned based on a primary key. MTable allows quick lookup & update of individual table rows based on row-key but at the same time also supports very efficient scan operations and range queries based on row keys.
MTable ( Mutable In-Memory Tables ) is a collection of Rows, Row is defined as a tuple of Row-Key and set of Columns.
Each row is uniquely identified by its Row-Key. Each Column is defined by its name and type. All primitive types, Date, TimesStamp, Map, Struct and Union Types are supported with MTable.
MTables comes with two types ORDERED_VERSIONED and UNORDERED
MTable of Type ORDERED_VERSIONED store Rows in sorted order of Row Key. Row Key for the table is of type byte and will be stored in lexicographic order. ORDERED_VERSIONED MTable is a good fit when you want to do range queries.
MTable of Type UNORDERED stores the data in a hash map. This type is suitable for faster GETs, which is nearly equal to O(1).
MTable supports multiple versions for each Row. If the version is explicitly specified, the row is stored with that version, otherwise, the current timestamp in milliseconds is used. One can easily access the instance of the row at a particular version, which helps in implementing the features such as transactional consistency, snapshot isolation, auditing etc.
MTable is partitioned & distributed across all the members of Ampool Cluster. MTable data is partitioned into number of buckets based on the row-key. ORDERED_VERSIONED type table uses range partitioning scheme while UNORDERED table uses hash partitioning. MTable also supports colocation of partitions from different MTables that have related data entries. This helps in running the efficient join queries. Also MTable supports configurable redundancy, so that one or more copies of each bucket are stored on different nodes to achieve high availability in various node failure scenarios.
MTable optionally provides an ability to persist the data on the local disk for recovery purposes. In case of power outage or node restart data in memory is restored from the local disk storage. Ampool Data Store supports two types of persistence, Synchronous and Asynchronous i.e. when data is ingested into MTable it is synchronously or asynchronously written to local disk for recovery. Asynchronous persistence is often used as it offers the best ingestion performance.
MTable does not limit the size of data to only available memory but allows overflow of data to local disk, if memory is full. Eviction threshold is specified for each member server in the cluster such that %memory occupied is more than eviction threshold then server evicts the column values based on LRU algorithm. The row keys and pointer to column values spilled on disk are retained in memory for faster access.
Ampool Data Store also provides a mechanism for CDC ( Change Data Capture ). Using this Applications can listen to MTable events like insert, delete, and updates. These events are delivered in the order of their occurrence. Application can implement CDCEventListener to listen and take actions on those events, typically applying these events to the target store for data replication.
MTable supports data-aware computation using Co-Processor Mechanism. It provides two types of Co-Processors resembling stored procedures or database triggers.
One can invoke an Endpoint co-processor at any time from the client. The Endpoint co-processor code is executed remotely on the server close to target MTable, and results are returned to the client. Observer Co-processors are similar to database triggers i.e. they execute user provided custom code before/after the insert/update events on the MTable.
What Operations can be performed on MTable
- Single and Batch Put of Rows
- Single and Batch Get of Rows
- Delete a Single Row
- Check and Put or Delete
- Scan a Table with Filter
- Coprocessor Execution
In the subsequent posts, We will deep dive into using MTable with example code.