Guest post by Shay Hassidim, Deputy CTO Distinguished Engineer.
We all sense the growing demands for faster processing of constantly growing datasets; everyone wants to be quick & keep a competitive edge by analysing data as quickly as possible. That’s why organizations are turning to in-memory computing in order to get real-time processing of their data. So far, this has been achieved by In-memory computing which provided application scalability and high availability. But (and there’s always a “but), since in-memory computing is RAM based, and RAM is costly – in memory computing can be far too expensive for several use cases.
Bridging the gap between RAM and Flash
Yesterday GigaSpaces announced a partnership with SanDisk® in delivering real-time analytics and in-memory computing with our new XAP MemoryXtend solution. XAP MemoryXtend leverages SanDisk’s new ZetaScale software – ZetaScale is a software solution that enables applications to extend memory from DRAM to flash, and leverage SSDs to deliver near-RAM performance.
XAP MemoryXtend is the product of taking ZetaScale technology and XAP IMC, and combining them with an SSD of your choice. This creates an SSD-based data grid that provides both performance, and A LOT more storage capacity.
So How Does it Work in a Nut Shell? Use-Case Scenario for Real-Time Analytics of Big Data
GigaSpaces XAP can be used as a “shock absorber” handling incoming event streams that are being processed in real time. The processed data or raw data can be delegated from XAP to the database. Data can be loaded from the database to XAP once started or on-demand. Large amounts of processed data can be stored in XAP maintained in RAM or SSD delivering real-time response time.
Performance and Price: The SSD-Based Data Grid Advantage
A data grid that is running pure in RAM on commodity servers can perform around 1M write/read operations per second per node (without replication), with 1K bytes payload. A client-side cache can actually deliver even faster read performance as it manipulate object references.
A data grid node usually runs on a commodity server with a multi-core CPU, as it must support a highly concurrent environment with many clients/threads accessing the data. It also scales in a linear manner – so if customers need a capacity of ten million writes per second, they can simply run a clustered data grid across ten nodes. It’s that simple!
Data grid read/write operations can be executed by remote clients or collocated clients (similar to database-stored procedures or triggers). With the collocated business logic model, the operations will enjoy ultra-low latency as there is no serialization or network activity performed – around ten microsecond latency. Remote operations will have around one millisecond latency when accessing the data grid (depends on the payload size of course) — still reasonably fast.
But How Well Does it Perform?
GigaSpaces XAP In-Memory Compute platform delivered in pure RAM mode 1.1 Million read TPS and 242K read TPS in SSD mode, 339K write TPS in RAM mode and 124K write TPS in SSD mode.
RAM is faster than SSD, and that’s no surprise. But it’s never just about performance; it’s equally as much about price, specifically as datasets scale. And when you look at the price-performance benchmark, the results are pretty incredible.
When analyzing the results via price-performance scale, we learn that XAP MemoryXtend (using SSD as the data storage) delivers 3.6 times better price-performance than XAP In Memory Computing (which is running in pure RAM mode) with write operations. With read operations – MemoryXtend delivers 2.14 times better price-performance when compared to RAM mode. SSD data-grid is actually very cost effective
* This graph assumes 1TB SSD price at $2K, 1TB RAM price at $20K.
With this benchmark, write operations against the SSD data grid performed using a new storage interface XAP provides where the write operation is fully acknowledged when both the data grid and the SSD are committing the transaction.
The benchmark was running on HP DL 380 server with two sockets 2.8GHz CPU with total twenty-four cores, 148G DRAM, CentOS 5.8, two FusionIO SLC PCIe cards with software raid zero. The payload is 1KB object size, single string based key, with uniform read distribution.
One Important Difference between the RAM Data Grid and SSD Data Grid Benchmark…
The SSD data grid benchmark performed with way more items. The RAM data grid benchmark executed with 20GB total capacity where the SSD data grid benchmark performed with a total of 1TB data capacity! An incredible capacity difference.
As a result of a fast and an optimized application level SSD interface (such as Universal SSD API) and the low impact of crossing the heap to the SSD boundary, XAP MemoryXtend can deliver reasonable performance with one important difference compared to pure RAM data grid: it has a huge data storage capacity per node — more than 1TB! This is fifty times larger than an average pure RAM data grid node.
How Can I try This Hands-On?
If you want to evaluate GigaSpace’s XAP MemoryXtend, you can simply download it from http://www.gigaspaces.com/xap-memoryxtend-flash-performance-big-data and get a free trial or book a spot at our free lab.
Learn more about the solution with the XAP MemoryXtend White Paper.
If you have your own applications you would like to test with SanDisk’s ZetaScale, contact firstname.lastname@example.org to get the ZetaScale SDK at no charge.