Data Center Tech Blog
Jimmy May

SQL Server Technologist, SanDisk Data Propulsion Lab

Columnstore indexes are a powerful memory-optimized column-oriented index intended for data warehouse workloads—but far too few architects know about them or their benefits. Columnstore indexes routinely improve performance by 10x, 100x (such as in the demo below), and even 1,000x—rendering results in seconds that classic row store data structures required minutes to satisfy. Microsoft recommends columnstore indexes as the default choice for data warehouses—and I concur. In this blog I provide:

  • A compelling video demo of the power of columnstore
  • A brief introduction to columnstore indexes for data warehousing
  • Better together: Columnstore on SanDisk® flash (TPC-H world records and Microsoft Data Warehouse Reference Architectures)
  • Next steps

The Power of Columnstore

I work in a division of SanDisk called the Data Propulsion Lab (DPL). There we do what I call “flipping the /faster bit”, and columnstore does exactly that for SQL Server data warehouses.

So how much faster is it? See for yourself by viewing the following video. And if you find it compelling that columnstore improves warehouse query performance times from over two minutes down to two seconds, then I invite you to read the rest of the post.

Demo: SQL Server Columnstore Performance

Introduction to Columnstore Indexes

Columnstore indexes were introduced in SQL Server 2012. Even though as stated Microsoft recommends columnstore indexes as the default choice for data warehouses, they remain largely unknown.

Transparent to the app, columnstore has been shown to render queries at sub-second response times which previously took many minutes or longer. In fact, columnstore indexes can increase SQL Server data warehouse query performance by one, two, or even three orders of magnitude.

In SQL Server 2012 some adoption blockers exist, yet columnstore is nonetheless a game changer for many apps.

In SQL Server 2014, columnstore was wholly re-architected and improved, potential blockers have been largely removed, and it is profoundly changing the way we interact with our data.

In SQL Server 2016, Microsoft has continued to enhance this remarkable feature.

How It Works

If we’re looking for a subset of columns from one or a few rows, given the right indexes, SQL Server can do a superlative job of providing an answer. Yet if a query by design needs to hit lots of rows—reporting, aggregations, grouping, scans, etc.—SQL Server has never had a good mechanism—until columnstore.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Contrasted with traditional row stores in which data is physically stored row-by-row, columnstore stores values for all rows for a given column. A given data page will store values only for the given column to which that page is allocated. Storage is pivoted relative to the row stores we’ve all grown up with. Columnstore literally turns row storage on its head—and this isn’t the only thing that’s different.

Better Together: Columnstore on Flash

SanDisk flash optimizes this fantastic technology—“better together”. Our technology is integral to TPC‑H world records such as these recent exercises with our partner, Lenovo: Fusion ioMemory Sets Two TPC-H World Records with Lenovo and Microsoft

SanDisk flash is the storage of choice for many configurations, including these seven Data Warehouse Fast Track Reference Architectures (DWFTRA) certified by Microsoft in collaboration with HP and Lenovo:

Table 1. List of publicly available Microsoft Data Warehouse Fast Track Reference Architecture Certifications

 

OEM Server Rack
Units
DB Size
(TB)
Measured
Throughput
(Queries/Hr/TB)
Microsoft Reference Architecture
Certification
Date
HP DL380 G8 2U 28 202 #2014-003* 9/3/2014
HP DL380 G8 2U 45 198 #2014-003* 9/3/2014
HP DL580 G8 4U 90 366 #2014-002 9/3/2014
Lenovo x3650 M5 2U 20 265 #2014-009 10/15/2014
Lenovo x3850 X6 4U 55 360 #2014-023 3/10/2015
Lenovo x3850 X6 4U 60 371 #2014-001 8/11/2014
Lenovo x3850 X6 4U 95 433 #2014-028 5/5/2015

*#2014-003 reflects two certifications issued on the same day and available in the same document.

One of the most remarkable characteristics of these ready-to-order affordable solutions is that they pack tens of terabytes of usable data into a single 2U or 4U server. Contrast this to conventional storage solutions requiring literally hundreds of spindles—spindles typically hosted by enterprise SANs costing millions for hardware and licensing. (See this 99sec clip from Lenovo demonstrating 100TB of raw SanDisk flash in their 2U x3650 M5.)

Next Steps

Stay tuned for additional information on SQL Server columnstore on SanDisk flash, including real-life performance examples and additional insight using our solutions. We’ll review the architecture, as well as the challenges, workarounds, travails, and big wins at customer sites. See whether columnstore can change your users’ world and why columnstore may be a compelling reason to upgrade.

Columnstore indexes flip the DW /faster bit, manifesting my motto, “Change the world or go home!”

The Future of Flash on Display: SanDisk® at FMS 2015

VMworld US 2015: Something new and exciting is happening in VMware vSphere 6

Subscribe Today!

Get our latest posts via email and enjoy data center insights in a flash