Cleversafe Brings Storage To Hadoop-Driven Big Data Analytics

Apache Hadoop is becoming the de facto infrastructure environment for pushing data across a distributed infrastructure to then later analyze with MapReduce in an effort to optimize web pages, personalize content or increase the effectiveness of online advertising.

There’s just one problem.

Hadoop is not meant as a storage environment. Metadata is kept on one server. Data is replicated three times to make sure nothing gets lost. That can get expensive when the data store is petabytes in size. And if there’s a failure on the metadata server, the replicated data can become entirely inaccessible. Further, maintaining three copies of data can lead to significant overhead and management costs.

Cleversafe believes it has the answer by combining its object-storage dispersal technology with the capabilities of Hadoop, MapReduce.

Cleversafe uses a technique called erasure coding. It take data and slices it into little pieces. The slices are distributed to separate disks, storage nodes and geographic locations.  Once dispersed, Cleversafe’s Informational Dispersal Algorithms (IDA) constitutes the data from a subset of the slices originally stored.

Here’s the twist. Cleversafe proposes that after dispersal, the data goes back into Hadoop for analysis. Hadoop does best when the data is brought to the computation. This is accomplished with the Cleversafe technique.

Cleversafe CEO Chris Gladwin tells us the benefit comes in three ways:

Techcrunch event

Disrupt 2026: The tech ecosystem, all in one room

Your next round. Your next hire. Your next breakout opportunity. Find it at TechCrunch Disrupt 2026, where 10,000+ founders, investors, and tech leaders gather for three days of 250+ tactical sessions, powerful introductions, and market-defining innovation. Register now to save up to $400.

Save up to $300 or 30% to TechCrunch Founder Summit

1,000+ founders and investors come together at TechCrunch Founder Summit 2026 for a full day focused on growth, execution, and real-world scaling. Learn from founders and investors who have shaped the industry. Connect with peers navigating similar growth stages. Walk away with tactics you can apply immediately

Offer ends March 13.

San Francisco, CA | October 13-15, 2026
  • Increased Productivity: The data comes to the compute. Cleversafe’s “Slicestor” provides both computation and storage.
  • Lower storage costs: Information dispersal calls for one instance of the data versus a multiple of three.
  • Better Reliability: Information dispersal eliminates single points of failure.

Does this represent the next generation of big data analytics? Hadoop has inherent weaknesses in terms of its storage capabilities. It’s why it is a natural fit with storage vendors such as EMC and IBM.

The difference for Cleversafe boils down to its unique erasure capabilities for building large-scale clouds such as the one it is helping develop now for Lockheed Martin to serve federal agencies.

But erasure coding does have its flaws as pointed out by Wikibon’s Dave Vellante. He points out that erasure coding is math heavy and requires considerable system resources to manage:

As such you need to architect different methods of managing data with plenty of compute resource. The idea is to spread resources over multiple nodes, share virtually nothing across those nodes and bet on Intel to increase performance over time. But generally, such systems are most appropriate for lower performance applications, making archiving a perfect fit.

If it can truly do all it says it can do then Cleversafe may prove a formidable player in the emerging big data land grab.

But I am not so sure. There’s a reason why Amazon Web Services and Google use a commodity infrastructure. It is affordable and can be continuously optimized. As efficiencies get better, the prices go down. On AWS Map Reduce, you can lease clusters to run Hadoop, MapReduce jobs.

Cleversafe has a compelling case but the methods used by AWS are an example of the benefits that come with distributed infrastructures built on very cheap hardware.

Topics

, , , , ,
Loading the next article
Error loading the next article