Who’s Buying? Cloudera Is The Must-Follow IPO Nightmare That We Can’t Take Our Eyes Off [Analysis]

Screen Shot 2017-04-06 at 6.39.38 AMThe tech world has recently come to a stir over Cloudera’s huge S-1 filing, indicating their intention to IPO in the coming months.

A lot has been written about this in different circles but I’d like to take a step back and just think about what this means — selling Hadoop + security and automation hasn’t yet proven to be a break-even-able model yet for the firms in the business, but the two major players will still be public. Whatever happened to businesses turning a profit, exactly? You say, what about SNAP and other social media players? Those are B2C, they haven’t got people paying for services yet, they rely heavily on advertising, the value is really in the data and not the services, and other arguments come to mind. But enterprise software was supposed to be the place where companies actually made money, right? Not in Hadoop.

Why is it so costly? Here is why: these vendors are specializing in a space which requires hiring expensive talent (look at the market for big data engineers and you’ll see why), lots of field sales work (on-premise software solutions costing many $$$ millions don’t sell over the phone) plus proof-of-concept work to make a sale. In additional, organizations require buy-in from the C-suite to IT and Ops to make Hadoop happen inside a company, slowing down sales cycles to an ant’s pace. Other vendors in the space like Alteryx or DataRobot or Tableau don’t have the same magnitude of concerns since the analytics slice of the market is closer to the value-drivers, showing clear ROI with less upfront marginal costs.

Also Cloudera is likely hugely overvalued based on its last funding round. Check out the link and you’ll see some analysis covering why, but it doesn’t look pretty.

Where is the golden path to growth?

We know the famous Forrester Research quote where “100 percent of large companies will adopt Hadoop in the next few years”. Does that mean that this is a rapidly maturing market? Where is the mega growth potential for Hadoop if the largest potential customers will already be on board? Going down-market with cloud/managed Hadoop solutions like we see with Amazon EMR or Micrsoft Azure HDInsights could be a key move forward, making Hadoop licenses accessible for more mid-market players trying to catch up to the Fortune 500. This is the main play that I see opening up a path to future growth for Hortonworks and Cloudera, especially if they can price in a way that enables them to make a healthy margin on these. Doubling down on inside sales for high-availability private VPC deployments would reduce CAC (cost to acquire the customer) costs, upfront hardware costs (instead with a focus on SAAS engagement/recurring revenue) and shorten deal cycles to the level where they could rapidly become very profitable. Also note that once an Org has done the HUGE legwork to direct all their data streams into their Hadoop VPC, the solution itself is rather sticky. So why not disassociate the software from the cost of the box?

Let me wrap up by saying that I very much hope that Cloudera’s IPO is a smash hit, being in this industry myself (Cloudera is also a close partner of DataRobot). A profitable and wildly successful Cloudera bodes well for the rest of the new enterprise data stack (Hardware, Storage, Platform, ETL, Data Science, Viz) that is currently taking over the market.

As the market leader, Cloudera is aiming much higher than rival Hortonworks:


Hortonworks S1 (2016: 185m revenue)Screen Shot 2017-04-06 at 6.08.50 AM

Strategic: Cloudera Launches New Data Science Platform


This week Cloudera made waves by announcing a very strategic new feature plugged into their Hadoop platform. The press release touts the launch of a tool for “Self-Service Data Science for the Enterprise” providing a native interface for Machine Learning on Hadoop. I think it’s important to give this some voice on the blog because this falls right in line with a lot of trends right now in the enterprise big data landscape.

All data-service/data-tech companies are working to find a niche in the new AI/ML/Data Science world as some of the attention and hype grows around the application of these tools in the enterprise. Most industries haven’t really integrated fully with Machine Learning because of the lack of data science talent across the company. Very few organizations can claim that they have data scientists in every department, and those few are probably all consulting firms.

What’s interesting about this is that they are delivering this capability inside the browser, like an iPython/Jupyter notebook. This kind of tooling is very popular in the open source community and with data-oriented developers but definitely not the kind of thing we’re used to seeing in enterprise. I personally love to use notebooks to plan talks and demonstrate all kinds of snippets — kaggle also hosts lots of notebooks which allow data scientists to show their work easily (probably the inspiration here).


Why so important? Because Hadoop vendors NEED to promote data science

Tons of large enterprises use Hadoop, but most of those haven’t really unlocked the promise of those installations (and millions of dollars advocated) yet. They are all investments in the future. Now those investments need to pay off dividends and generate business value or else these installations will be considered to be underwhelming at best, or failures at worst. Check out this figure from an O’Reilly report on the big data market:

U.S. Companies using Hadoop

Most enterprises aren’t that mature with their Hadoop practice or usage. It’s not as sticky as they’d like to see, with most companies being classified as Lab Project users or Tire kickers. Not exactly producing results

Cloudera launching products like this workbench makes total sense — they reduce the barrier to entry considerably and get a chance to bring clients to the “we use hadoop everyday for critical business processes stage”.

Step in the right direction

Figure above from the original Cloudera blog post

This tool definitely looks like a step in the right direction, giving easy loading of files stored in Hadoop in a slick IDE. Now the barrier to entry won’t be the access to the data or the significant technical (or security) hurdles of copying data from a corporate hadoop cluster to play with it locally. Each barrier to entry that falls will enable companies to spend much less time in the Lab project level and Application development stages and move quickly to Mature state, where we’ll see further and further automation being delivered as part of products and services.