We are excited to announce the Ampool 2.0 release which is an important milestone for Ampool. This release includes many new features and important performance  improvements.

Following major features are added:

FTable Storage Format

The FTable employs block strategy for its in-memory layer that groups multiple records together and minimizes per record storage overhead and the overall memory footprint. Thus, for append-only fact tables, logs or metrics, it helps you to make optimal usage of available memory. With 2.0 following three different formats are supported.

  • AMP_BYTES : The rows are serialized using Ampool specific encoding and are stored individually.  This may consume more memory but there is no additional overhead during scan to retrieve a row.
  • AMP_SNAPPY: The rows serialized using Ampool specific encoding and compressed using Snappy compression upon reaching the specified block size. This will help reducing memory usage but all rows will have to be decompressed during scan.
  • ORC_BYTES: All rows from a block will be converted to ORC columnar format upon reaching the specified block size.  Then each block will contain binary ORC data, representing the rows, which will be interpreted during scan.

FTable Delta update:

FTable stores multiple records together and need to propagate the update to the replicated copies and persistence layer. Following two types of delta propagation are added in this release.

  • Delta propagation to replicated copies: In case of updates to the same block, entire block is replicated and this incurs a major network overhead. With this release both single append and batch append operations that are updating the block, only the appended records are propagated to the replicated copies. This reduces the network overhead and improves the performance for the append operations to the FTable.
  • Delta propagation to persistence layer: As FTable stores rows in each bucket inside blocks of rows, all the ingestion operations (append/batch append) update the same block until the block size is reached. In case of updates to the same block complete block value used to be written to the disk rather than just an update to the block. This causes the lot of disk writes and compaction. With this release, only the updated records to the block are propagated  to the persistence layer and this reduces the disk writes and compaction.

Security Enhancements:

Ampool supports authorization to control access to data by authenticated user. The admin can control access to a table data depending on the identity of the user attempting the operation. Following two types of authorization are supported

  • Sentry Authorization :  Apache Sentry is a centralized store of authorization data and enforce fine grain role based authorization to data stored in data storage system such as Ampool ADS.
  • LDAP Authorization : Ampool ADS  leverages LDAP server for user authentication and authorization. In this use case, the LDAP server creates and manages users, and no information about users is stored on the ampool. Group/role information is managed both on the LDAP server and in Ampool.

Column Statistics :

For FTable, column statistics per block are generated and stored with the block. This helps in skipping the unwanted blocks during scan using filters. The stored statistics are min and max per column. These are updated with each append/batch-append. The statistics are stored for these data types: INT, LONG, BYTE, SHORT, FLOAT, DOUBLE, DATE, TIMESTAMP, STRING. The column statistics can help eliminate having to scan or decompress the block completely when no matching row could be found in the block.

Also, following known issues from previous releases are fixed in this release:

  • Provide functionality of deleting all the versions of all the keys qualified by given filter list without having to provide the key list.
  • MASH: Add a command to show table distribution on data and buckets on both primary and secondary copies.
  • Support for lowercase types names in table schema.
  • Server scan performance improvement.

Performance Improvements with delta replication

Configuration:

Single Append operation on FTable
column-length=100
num-columns=10
redundancy=3 
Number of buckets : 113
FTable Block Size : 1000

Number of Rows

Append Time in Seconds

(with Delta Replication)

Append Time in Seconds

(without Delta Replication)

Speedup
20000 16 320 ~20 Times
200000 164 3202 ~20 Times
2000000 1688 32030 ~19 Times

Performance Improvements with delta persistence

Configuration:

Servers Nodes: 8
Heap size per server:  50GB
Number of buckets : 113
FTable Block Size : 1000
Client Batch Size: 1000
Redundancy=3

Number of Rows (Size) With Delta Persistence Without Delta Persistence Difference
  Ingestion Time(sec)
five parallel clients
Total 
size on Disk (GB)

Total Heap
Size (GB)

Total Writes
on disk

(iostat)(GB)

Number of
oplog  files
created
(total files created)

Ingestion Time(sec)
five parallel
clients

Total Disk
size (GB)
Total Heap
Size (GB)

Total  Writes
on disk

(iostat)(GB)

Number of oplog  files
created
(total files created)
%Reduction wrt time %Reduction wrt size on disk %Reduction wrt to disk writes %Reduction wrt number of  oplogs files
50Million (40GB) 949 72 69.6 59.67 72 1168 104 70.4 90.76 1211 18.75 30.76 34.25 94.05
50Million (40GB) 945.8 72 70.4 59.66 72 1127 104 70.4 92.93 1225 16.07 30.76 35.79 94.12
50Million (40GB) 941.2 72 69.6 59.66 72 1146.2 104 70.4 91.66 1202 17.88 30.76 34.91 94.00

 

Release notes are updated at http://docs.ampool-inc.com/core/RN2.0.0/

Click here to download the Ampool release 2.0.

Our Open Source project Monarch will be updated soon with Ampool 2.0 release changes, and this release will also be available on AWS Marketplace (both single node, and cluster mode) in early 2018.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *