Enterprises are increasingly taking advantage of the Internet of Things (IoT) whereby embedded sensors across many devices collect massive amounts data. As this data is collected, enterprises use business intelligence tools and Big Data analytics to derive business value from this data. However, IT budgets and existing storage infrastructure can’t scale to meet the demands of the growth of data or the need for fast data – data that is ready to be analyzed and mined as close to real time as possible.
In this blog series we will explore the different options for building world-class data center storage infrastructure by combining software-defined storage solutions with InfiniFlash for unprecedented scaling capabilities and breakthrough economics.
The InfiniFlash system from SanDisk® is a new, high-performance storage platform offering massive capacity and high density using a low cost enclosure in order to address the demands of capacity workloads at scale. It delivers breakthrough economics for customers with big data storage requirements, particularly in software-defined storage infrastructures. InfiniFlash provides significant cost savings over traditional (monolithic) storage solutions that can often result in vendor lock-in and premium pricing.
IBM General Parallel File System (GPFS) with 256TB of InfiniFlash
In our first blog of this series we take a look at a system built using IBM General Parallel File System (GPFS) and the InfiniFlash IF100 with 256TB of flash for a high-speed clustered file system. IBM GPFS is one of the most mature and widely used software-defined storage solutions in the enterprise today and offers excellent performance when paired with all-flash enclosures. For our infrastructure we have installed GPFS Server 220.127.116.11 on a Dell R720 with 64GB of RAM and have directly connected it to the InfiniFlash system over four SAS cables as shown in the diagram below:
Figure 1: GPFS NSD Server on Dell R720 connected to InfiniFlash system
Performance testing was conducted using fio (Flexible I/O Tester). Our initial tests were conducted to simulate a particular customer’s workload that required high bandwidth streaming writes combined with 100% random reads across the entire dataset. This test simulated a workload that involves capturing streaming sensor data while simultaneously allowing analysts to query and run data analysis across the entire dataset. We ran two separate tests; the first combining 64K 100% random reads with 1MB sequential writes and the second test combining 256K 100% random reads with 1MB sequential writes as shown in the two graphs below.
Figure 2: Single Server GPFS Tests on InfiniFlash 64K 100% random reads with 1MB sequential writes
Figure 3: Single Server GPFS Tests on InfiniFlash 256K 100% random reads with 1MB sequential writes
For these initial tests we ran the workload simulation directly on the server nodes. Next we configured two GPFS Clients and connected them to the GPFS NSD Server over InfiniBand (IB) as shown in the diagram below.
Figure 4: Single GPFS NSD Server with two GPFS Clients
Having connected the clients to the GPFS NSD Server through a Mellanox IB switch, we then ran identical workloads, first using just a single client, and then using two clients to evaluate how well the solution scaled. As you can see in the charts below, the bandwidth results roughly doubled when adding the second client.
Figure 5 Single GPFS Client w/ Single NSD Server – 64K 100% random reads with 1MB sequential writes
Figure 6: Single GPFS Client w/ Single GPFS NSD Server – 256K 100% random reads with 1MB sequential writes
Figure 7: Two GPFS Clients w/ Single NSD Server – 64K 100% random reads with 1MB sequential writes
Figure 8: Two GPFS Client w/ Single NSD Server – 256K 100% random reads with 1MB sequential writes
Our testing shows that IBM GPFS running on the InfiniFlash IF100 256TB provides an extremely scalable solution that delivers maximum performance for workloads that combine large amounts of sequential streaming data with fast random data access. This new storage solution enables enterprises with unprecedented scaling at lower costs to extract more insight from collected data, attached and embedded devices.