Data Center Tech Blog

Hadoop technology just turned 10, and has gained tremendous momentum. But some components of the traditional architecture are coming of age, and a new approach to Hadoop architecture is emerging.

Janet George

Fellow and Chief Data Scientist for Big Data and cognitive computing

Hadoop technology just turned 10. During my career, I was fortunate to work for Yahoo where Hadoop was born– from a technical paper to the first operational enterprise data platform. I built the very first research engineering team from the ground up to form the first operational grid clusters with Hadoop running and fully operational, handling operational loads in the order of petabytes of data. I was also part of the team working on the open-sourcing of Apache Hadoop.

Over the years, Hadoop has gained tremendous momentum, giving birth to many distributions with wide adoption across enterprises. It has become completely integrated into the de facto Big Data platform stack. It’s robust, very reliable, scalable, and enterprise grade. However, as progress marches on, some components of the traditional architecture are coming of age, and a new approach to Hadoop architecture is emerging.

Shared vs. Distributed HDFS

HDFS has served as the primary storage system used by Hadoop. It is a distributed file system that provides high performance access to data across Hadoop clusters. It has become the distributed file system of choice for many enterprises managing large pools of Big Data, and enabling Big Data analytics applications.

But the nature of progress is that technology is in continuous evolution. More compelling systems emerge with better architectures and better storage.

So what’s the right Hadoop architecture for your Big Data analytics – shared or distributed?

What’s Right for Me?

In a recent webinar, I compared and contrasted the two current approaches. The original HDFS approach utilizes storage co-located with the compute servers. An emerging alternative relies on dedicated storage resources shared by the compute cluster.

I wanted to provide definitive guidelines to planners and architects in order to help them identify the best solutions for their needs when implementing Hadoop.

You can stream the webinar, on-demand, for free. Feel free to reach out in the comments below with your questions.

WEBINAR: Shared or Distributed HDFS – What’s Right for Me?
Stream it here

Stay up to date

Get weekly insights on Big Data, Cloud and Virtualization from the IT Blog

 

A New Approach! All-Flash Storage Solution for SAP HANA

Western Digital Expands the Possibilities of Data at Flash Memory Summit

bring your data to life

Today’s digital economy with mobile, IoT and cloud is based on the value of data. How do you unlock it? Download the infographic to learn more: