Scaling out AND up, a compromise

You might have noticed that there’s been quite a (mostly civil, I think) debate about RAID and scaling going on recently:

I’d like to address some of the—in my opinion—misconceptions about “scaling out” that I’ve seen many times recently, and provide some of my experience and opinions.

It’s all about compromise.

Human time is expensive. Having operations, engineering, etc. deal with tasks (such as re-imaging a machine) when fixing a problem that could have been a 30-second disk swap is inefficient use of human resources. Don’t cut corners where it doesn’t make sense. This calls back to Brian’s comments about the real cost of your failed $200 part.

Scaling out doesn’t mean using crappy hardware. I think people take the “scale out” model (that they’ve often only read about from outdated conference presentations) to quite an extreme. They think scaling out means using desktop-class, bad hardware, and just buying a ton of them. That model doesn’t work, and it’s hell to maintain in the long term.

Compromise. One of the key points in the scale-out model: size the physical hardware reasonably to achieve the best compromise between scaling out and scaling UP. This is the main reason that I assert RAID is not going anywhere… it is often simply the best and cheapest way to achieve the performance and reliability that you need in each physical machine in order to make the scale out model work.

Use commodity hardware. You often hear the term “commodity hardware” in reference to scale out. While crappy hardware is also commodity, what this means is that instead of getting stuck on the low-end $40k machine, with thoughts of upgrading to the $250k machine, and maybe later the $1M machine, you use data partitioning and any number of let’s say $5k machines. That doesn’t mean a $1k single-disk crappy machine as said above. What does it mean for the machine to be “commodity”? It means that the components are standardized, common, and the price is set by the market, not by a single corporation. Use commodity machines configured with a good balance of price vs. performance.

Use data partitioning (sharding). I haven’t talked much about this in my previous posts, because it’s sort of a given. My participation in the HiveDB project and my recent talks on “Scaling and High Availability Architectures” at the MySQL Conference and Expo should say enough about my feelings on this subject. Nonetheless I’ll repeat a few points from my talk: data partitioning is the only game in town, cache everything, and use MySQL replication for high availability and redundancy.

Nonetheless, RAID is cheap. I’ve said it several times already, just to be sure you heard me correctly: RAID is a cheap and efficient way to gain both performance and reliability out of your commodity hardware. For most systems, engineering time, operations time, etc., is going to be a lot more expensive to get the same sort of reliability out of a non-RAID partitioned system versus a RAID partitioned system. Yes, other components will fail, but in a sufficiently large data-centric system with server class hardware, disks will fail 10:1 or more over anything else.

That is all, carry on.

Update: Sebastian Wallberg has translated this entry to German. Thanks Sebastian!



Google
 
Search the Web Search only jcole.us

8 Responses to “Scaling out AND up, a compromise”

  1. Kevin Burton Says:

    Agreed on the compromise part….

    This is one reason we’re going with Opterons with more memory instead of our cheaper athlon boxes. The bang for the buck is just better.

    Another point on the whole “RAID is dying” meme.. I think the growth of software RAID is a sign that RAID is dying. Eventually the software will support multiple disks internally. MySQL 5.1 and partitioning is a good example (though it’s not all the way there yet).

    Onward!

    Kevin

  2. Xaprb Says:

    I like hearing you define what commodity hardware is. I so often hear people talk about it as though one should strive for running your whole company on this enormous cluster of eMachine Pentium II 300MHz machines from 1998. This is usually from someone who becomes a loudmouth when in a bar. And then they always say “That’s how Google does it!”

    So I asked someone at Google who knows, and as you know, that’s not how Google does it :-) So I think it bears repeating as often as you mention commodity hardware “and by commodity, I mean…”

  3. "links for 2007-06-12" by Bob Plankers, The Lone Sysadmin Says:

    [...] jcole’s weblog: Jeremy Cole’s take on life. » Blog Archive » Scaling out AND up, a compromise RAID isn’t going anywhere anytime soon (I agree wholeheartedly with Jeremy) [...]

  4. MySQL Performance Blog » RAID and Scale Out Discussions Says:

    [...] Just found this wonderful summary of articles by Jeremy and wanted to give some of my thoughts on the topic. [...]

  5. Steven Roussey Says:

    After the disk, it will be the fans! Anyway, we still scale out with dual socket servers and 8 disk RAID 10 for true database stuff (as opposed to files). Easy and fast.

  6. Final words on MySQL and RAID « François Schiettecatte’s Blog Says:

    [...] Final words on MySQL and RAID Jeremy Cole has some very good final words on MySQL and RAID, and he also points to all the articles that were written about the subject. [...]

  7. Scaling with RAID? :: Fat Penguin Says:

    [...] There’s a good blog entry on the advantages and disadvantages of using RAID when scaling. Take it for what it’s worth. :) [...]

  8. Everything is a Funky DNS Problem ! Says:

    The end of Raid as we know it .. RAID is dead, long live RAIS !

    I somehow totally missed this thread , started and summarized by Jeremy Cole on the death of Raid ..

    In Yes Jeremy, RAID Really Is Dying Kevin Burton
    makes a good point ..

    Most large scale out shops should probably be using a redundant array

Leave a Reply