Archive for the 'MySQL' Category


Scaling out MySQL: Hardware today and tomorrow

by Jeremy Cole on Thursday, April 17th, 2008 at 12:45:06 in MySQL, MySQL User Conf

My slides have just been uploaded for the talk I just gave at the MySQL Conference and Expo 2008 titled “Scaling out MySQL: Hardware today and tomorrow“. You can download them now as PPT and PDF.

Thanks for coming to my talk!

Bravo Oracle: InnoDB Plugin 1.0 released

by Jeremy Cole on Wednesday, April 16th, 2008 at 07:48:39 in InnoDB, MySQL, MySQL User Conf

Yesterday, Oracle‘s Ken Jacobs and Heikki Tuuri, creator of InnoDB, have announced the immediate release of InnoDB Plugin 1.0 for MySQL 5.1. I’ve already downloaded it and played around with it a bit. I haven’t had time to do any performance benchmarks or similar just yet. Those will come in due time.

This release is the beginning of two exciting things: InnoDB is now officially decoupled from MySQL release-wise, and lots of new features have been added to this new release. I will come to what the decoupling means in a moment, but first, the major new features in this release of InnoDB (from my perspective, and with my commentary):

  • Fast in-place index management — The ability to add and drop indexes without rebuilding the entire table in place. This isn’t a complete implementation of the long-awaited “online ALTER TABLE“, as that is mostly a MySQL problem rather than an InnoDB problem.
  • Compression of data and indexes — This should allow data size on disk to be reduced substantially.
  • Storage of entire BLOB, TEXT, and VARCHAR data off-page — This can allow more efficient PRIMARY KEY indexes (where the data is stored in InnoDB because of the index clustering) on tables with BLOB, TEXT, or large VARCHAR columns
  • New information_schema tables — InnoDB is now providing a lot more visibility into what is going on internally. I’m hopeful to extend this even further.
  • InnoDB-specific “strict mode” — Don’t allow InnoDB to fudge things internally, forcing it to error in circumstances it can’t handle, rather than just give a warning or silently continue.

All of the above features look excellent, but one of the more interesting aspects of this announcement is the fact that MySQL and InnoDB are now decoupled for release. That is, Oracle can make a release of InnoDB without having to wait for MySQL to make their own release. While this will make it slightly more difficult to describe what version of MySQL/InnoDB you’re using (especially without a way to find that out from MySQL), it has the potential to make the release process much quicker and more efficient.

I’m quite hopeful, and from what I have seen in the recent past with the collaboration between Heikki, Ken, and Yasufumi (an outside contributor) on fixing InnoDB performance bugs, quite positive, that Oracle will be a lot more accepting of outside patches and code contributions to the InnoDB codebase than MySQL has been recently. I’ve got a lot of ideas for new features targeted at manageability that I’d like to get implemented. I’d love to hear Oracle’s comments on how they will accept patches to InnoDB now, and what they see the community interaction looking like in the future. Ken, any comment?

The only negative aspects I can see with this announcement are:

  • Oracle spent a couple of years working on this in silence, away from the community. Most everyone was surprised by this release, as we haven’t seen anything from Oracle about InnoDB in a long time, and to some extent we were sitting in the corner hopeful that Oracle really doesn’t plan on killing the project. While I understand that perfectly well from a business standpoint, I am hopeful that working for long periods away from the community can be minimized in the future, so that we can all stay more involved.
  • It is now a bit harder to get a new MySQL/InnoDB installation up and running, as the newest InnoDB is not part of the MySQL packages anymore. I’m sure this can be cleaned up with some smart RPM (etc.) packaging.

Overall, I am excited about this announcement, and quite happy that Oracle is making some serious contributions and commitments to maintaining InnoDB. Thanks for all your hard work, Ken and Heikki and the rest of the InnoDB team! Let me know if there’s anything I can do for you!

Just announced: MySQL to launch new features only in MySQL Enterprise

by Jeremy Cole on Monday, April 14th, 2008 at 18:17:21 in MySQL, MySQL User Conf

Just announced at the MySQL Partner meeting as part of the MySQL Conference and Expo in Santa Clara, CA:

MySQL will start offering some features (specifically ones related to online backups) only in MySQL Enterprise. This represents a substantive change to their development model — previously they have been developing features in both MySQL Community and MySQL Enterprise. However, with a shift to offering some features only in MySQL Enterprise, this means a shift to development of those features occurring (and thus code being tested) only in MySQL Enterprise.

As I’ve discussed before, the size of the user base for MySQL Enterprise is much smaller than for MySQL Community. That means these critical features will be tested by only a few of their customers. So, in effect, they will be giving their paying customers real, true, untested code. How is this supposed to work? In addition, this means that they are changing their internal development model, splitting the relationship between the two trees, and overall going even further down the path of getting the RHEL/Fedora model backwards.

What do you think about this? Leave a comment, I’m really curious as to everyone’s feeling on this.

UPDATE: Marten Mickos has just acknowledge that I understood the slide quite correctly, and they will indeed develop new features in MySQL Enterprise (in 6.0), without making them available in MySQL Community. Hmm!

MySQL Meetup at 7pm tonight at MySQL Conference

by Jeremy Cole on at 15:32:32 in MySQL, MySQL User Conf

We’re hosting the Silicon Valley MySQL Meetup tonight at the MySQL Conference and Expo venue in Ballroom D in Santa Clara, CA. We start at 7pm and run about 2 hours. Come on down!

On Sun’s acquisition of MySQL AB

by Jeremy Cole on Wednesday, January 16th, 2008 at 11:14:40 in MySQL, Technology

If you follow the MySQL world at all, or you just have your eyes open, you have probably noticed that an agreement has been reached for Sun to acquire MySQL AB for about one billion dollars. Quite a few people have asked for my thoughts on the matter, so I will provide them publicly here for all. Overall, I see this as a mostly good thing.

I think that Sun has a very good chance of leading MySQL better than MySQL. At the same time, it’s always disconcerting to see a project managed within a very large company. Having been through the large company picture once already, I know how wrongly things can go when too many people (especially management types) are involved in a project.

At the same time, though, I’ve always liked Sun, and have high hopes for Sun’s management of the MySQL project and the people. There are a number of things I would like to see Sun do with MySQL following the acquisition:

Fix the MySQL Enterprise development model

The MySQL Enterprise (and by proxy MySQL Community) development model has been broken for a long time now. Too long. Take a cue from Sun itself and from RedHat and fix it right. I have a lot of ideas as to how the development model should work, and although our efforts have been time-constrained, we’ve made some effort to actually implement those ideas in DorsalSource.

Fix the product

There are a lot of areas where MySQL has been lacking for a long time, and the power users have been either crying in their beer (most users), or doing the work themselves (us, Google). I have shared some of these ideas with various people over the years, but here are some of the areas/ideas we have:

  • Replication works fairly well (usually), but its model is completely broken and deficient to go forward with it with all the new features of MySQL. It lacks any real solution for multiple masters, synchronous or semi-synchronous replication, safety (checksums, binlog index and master info sync problems), and conflict resolution or automatic detection (transaction ids).
  • Fix the internal memory allocation model so that it’s possible to constrain the memory usage of MySQL. The current situations sucks.
  • Remove some of the outdated cruft littered all over MySQL: MERGE tables
  • Clean up the logging (general, slow, [future] custom logs) code to be completely configurable and sane.
  • Fix auto_increment. We have suffered with it too long. The storage engine isn’t the place to generate sequences, and InnoDB and the replication model suffer greatly for it.
  • Fix the optimizer so that it makes more sane choices and can be more easily extended.
  • This is partly a product problem and partly a people problem, but stop creating 100 different experimental storage engines, and pushing them as truth. Yes, archive, federated, blackhole, I’m looking at you.

Get some muzzles on the sales and marketing team

As Proven Scaling well knows, MySQL basically sells itself. It would be great if we didn’t have to hear from any more customers that MySQL’s sales team has or is trying to screw them over. No consulting without support? Stupid rule. Insane point of view on licensing? Get rid of it. Fluff? Don’t do it.

Bonus?

Sun should buy Innobase Oy back from the clutches of Oracle, pull InnoDB into MySQL proper, and relicense the full set of code under LGPL, BSD, or another similar license. Personally, I think the GPL is fine for MySQL, but the MySQL sales team has done so much damage to people’s ideas about the GPL—when they do and do not need licenses for MySQL—that it’s hard to continue under the GPL now.

Conclusion

I’m hopeful. I’m hesitant to shout out in glee. If anyone has any questions or comments regarding the above ideas, comments, or thoughts: let me know! I’d be happy to discuss any of them.

MySQL Meetup Silicon Valley: Tonight at Google

by Jeremy Cole on Monday, January 14th, 2008 at 14:55:52 in MySQL, MySQL Meetup

I haven’t written anything in a while about the MySQL Meetup I host in Silicon Valley, but I though the new year would be a good time to invite everyone to attend.

The Silicon Valley MySQL Meetup is hosted at Google’s campus in Mountain View. We regularly have open Q&A sessions, where we collect everyone’s questions, and then go through them one by one discussing, answering, and adding more questions. We stop when we run out of questions, or time. (Usually time runs out first.) Sometimes we’ll have myself or someone else with an official presentation.

Come on out for tonight’s Meetup (open Q&A), I’ll see you there!

Proven Scaling goes global

by Jeremy Cole on Thursday, November 29th, 2007 at 17:02:10 in MySQL, Proven Scaling

A bit of exciting news… Proven Scaling has officially gone global with the addition of a new MySQL Geek, Mike Griffiths of London, England to our team. We are now capable of easily and efficiently (timezone-wise and travel-wise) handling your on-site MySQL consulting needs in the London area, the UK, and Europe at large. (As well as making remote work in the middle of the night for our US customers a fair bit easier.)

Mike comes to us from Yahoo! Europe where he worked for a number of years specializing in MySQL operations, performance and optimization, replication and high availability, and scalability. He has been a personal friend for several years and I have been looking forward to him joining the Proven Scaling team since we started the company.

If you’re looking for MySQL consulting in Europe (or anywhere, for that matter), let us know!

On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS

by Jeremy Cole on Saturday, November 24th, 2007 at 05:07:24 in GIS and Cartography, MySQL, MySQL Tips, Technology

Geo-referencing IPs is, in a nutshell, converting an IP address, perhaps from an incoming web visitor, a log file, a data file, or some other place, into the name of some entity owning that IP address. There are a lot of reasons you may want to geo-reference IP addresses to country, city, etc., such as in simple ad targeting systems, geographic load balancing, web analytics, and many more applications.

This is a very common task, but I have never actually seen it done efficiently in MySQL in the wild. There is a lot of questionable advice on forums, blogs, and other sites out there on this topic. After working with a Proven Scaling customer, I recently did some thinking and some performance testing on this problem, so I thought I would publish some hard data and advice for everyone.

Unfortunately, R-tree (spatial) indexes have not been added to InnoDB yet, so the tricks in this entry only work efficiently with MyISAM tables (although they should work with InnoDB, they will perform poorly). This is actually OK for the most part, as the geo-referencing functionality most people need doesn’t really need transactional support, and since the data tables are basically read-only (monthly replacements are published), the likelyhood of corruption in MyISAM due to any server failures isn’t very high.

The data provided by MaxMind

MaxMind is a great company that produces several geo-referencing databases. They release both a commercial (for-pay, but affordable) product called GeoIP, and a free version of the same databases, called GeoLite. The most popular of their databases that I’ve seen used is GeoLite Country. This allows you look up nearly any IP and find out which country (hopefully) its user resides in. The free GeoLite versions are normally good enough, at about 98% accurate, but the for-pay GeoIP versions in theory are more accurate. In this article I will refer to both GeoIP and GeoLite as “GeoIP” for simplicity.

GeoIP Country is available as a CSV file containing the following fields:

  • ip from, ip to (text) — The start and end IP addresses as text in dotted-quad human readable format, e.g. “3.0.0.0″. This is a handy way for a human to read an IP address, but a very inefficient way for a computer to store and handle IP addresses.
  • ip from, ip to (integer) — The same start and end IP addresses as 32-bit integers1, e.g. 50331648.
  • country code — The 2-letter ISO country code for the country to which this IP address has been assigned, or in some cases other strings, such as “A2″ meaning “Satellite Provider”.
  • country name — The full country name of the same. This is redundant with the country code if you have a lookup table of country codes (including MaxMind’s non-ISO codes), or if you make one from the GeoIP data.

A simple way to search for an IP

Once the data has been loaded into MySQL (which will be explained in depth later), there will be a have a table with a range (a lower and upper bound), and some metadata about that range. For example, one row from the GeoIP data (without the redundant columns) looks like:

ip_from ip_to country_code
50331648 68257567 US

The natural thing that would come to mind (and in fact the solution offered by MaxMind themselves2) is BETWEEN. A simple query to search for the IP 4.2.2.1 would be:

SELECT country_code
FROM ip_country
WHERE INET_ATON("4.2.2.1") BETWEEN ip_from AND ip_to

Unfortunately, while simple and natural, this construct is extremely inefficient, and can’t effectively use indexes (although it can use them, it isn’t efficient). The reason for this is that it’s an open-ended range, and it is impossible to close the range by adding anything to the query. In fact I haven’t been able to meaningfully improve on the performance at all.

A much better solution

While it probably isn’t the first thing that would come to mind, MySQL’s GIS support is actually perfect for this task. Geo-referencing an IP address to a country boils down to “find which range or ranges this item belongs to”, and this can be done quite efficiently using spatial R-tree indexes in MySQL’s GIS implementation.

The way this works is that each IP range of (ip_from, ip_to) is represented as a rectangular polygon from (ip_from, -1) to (ip_to, +1) as illustrated here:



In SQL/GIS terms, each IP range is represented by a 5-point rectangular POLYGON like this one, representing the IP range of 3.0.0.0 – 4.17.135.31:

POLYGON((
  50331648 -1,
  68257567 -1,
  68257567  1,
  50331648  1,
  50331648 -1
))

The search IP address can be represented as a point of (ip, 0), and that point with have a relationship with at least one of the polygons (provided it’s a valid IP and part of the GeoIP database) as illustrated here:



It is then possible to search these polygons for a specific point representing an IP address using the GIS spatial relationship function MBRCONTAINS and POINT3 to search for “which polygon contains this point” like this:

SELECT country_code
FROM ip_country
WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON('4.2.2.1'), 0)))

Pretty cool huh? I will show how to load the data and get started, then take look at how it performs in the real world, and compare the raw numbers between the two methods.

Loading the data and preparing for work

First, a table must be created to hold the data. A POLYGON field will be used to store the IP range. Technically, at this point the ip_from and ip_to fields are unnecessary, but given the complexity of extracting the IPs from the POLYGON field using MySQL functions, they will be kept anyway. This schema can be used to hold the data4:

CREATE TABLE ip_country
(
  id           INT UNSIGNED  NOT NULL auto_increment,
  ip_poly      POLYGON       NOT NULL,
  ip_from      INT UNSIGNED  NOT NULL,
  ip_to        INT UNSIGNED  NOT NULL,
  country_code CHAR(2)       NOT NULL,
  PRIMARY KEY (id),
  SPATIAL INDEX (ip_poly)
);

After the table has been created, the GeoIP data must be loaded into it from the CSV file, GeoIPCountryWhois.csv, downloaded from MaxMind. The LOAD DATA command can be used to do this like so:

LOAD DATA LOCAL INFILE "GeoIPCountryWhois.csv"
INTO TABLE ip_country
FIELDS
  TERMINATED BY ","
  ENCLOSED BY "\""
LINES
  TERMINATED BY "\n"
(
  @ip_from_string, @ip_to_string,
  @ip_from, @ip_to,
  @country_code, @country_string
)
SET
  id      := NULL,
  ip_from := @ip_from,
  ip_to   := @ip_to,
  ip_poly := GEOMFROMWKB(POLYGON(LINESTRING(
    /* clockwise, 4 points and back to 0 */
    POINT(@ip_from, -1), /* 0, top left */
    POINT(@ip_to,   -1), /* 1, top right */
    POINT(@ip_to,    1), /* 2, bottom right */
    POINT(@ip_from,  1), /* 3, bottom left */
    POINT(@ip_from, -1)  /* 0, back to start */
  ))),
  country_code := @country_code
;

During the load process, the ip_from_string, ip_to_string, and country_string fields are thrown away, as they are redundant. A few GIS functions are used to build the POLYGON for ip_poly from the ip_from and ip_to fields on-the-fly. On my test machine it takes about 5 seconds to load the 96,641 rows in this month’s CSV file.

At this point the data is loaded, and everything is ready to go to use the above SQL query to search for IPs. Try a few out to see if they seem to make sense!

Performance: The test setup

In order to really test things, a bigger load testing framework will be needed, as well as a few machines to generate load. In my tests, the machine being tested, kamet, is a Dell PowerEdge 2950 with Dual Dual Core Xeon 5050 @ 3.00Ghz, and 4GB RAM. We have four test clients, makalu{0-3}, which are Apple Mac Mini with 1.66Ghz Intel CPUs and 512MB RAM. The machines are all connected with a Netgear JGS524NA 24-port GigE switch. For the purposes of this test, the disk configuration is not important. On the software side, the server is running CentOS 4.5 with kernel 2.6.9-55.0.2.ELsmp. The Grinder 3.0b32 is used as a load generation tool with a custom Jython script and Connector/J 5.1.5 to connect to MySQL 5.0.45.

There are a few interesting metrics that I tested for:

  • The latency and queries per second with a single client repeatedly querying.
  • Does the number of queries handled increase as the number of clients increases?
  • Is latency and overall performance adversely affected by many clients?

The test consisted of an IP search using the two different methods, and varying the number of clients between 1 and 16 in the following configurations:

Clients Machines Threads
1 1 1
2 1 2
4 1 4
8 2 4
16 4 4

Each test finds the country code for a random dotted-quad format IP address passed in as a string.

How does it perform? How does it compare?

There are a few metrics for determining the performance of these searches. If you tried the BETWEEN version of this query, you may have noticed that, in terms of human time, it doesn’t take very long anyway: I pretty consistently got 1 row in set (0.00 sec). But don’t let that fool you.

It’s clear that GIS wins hands down.

First, a look at raw performance in terms of queries per second.

Using BETWEEN, we max out at 264q/s with 16 clients:

Using MBRCONTAINS, we max out at 17600q/s with 16 clients, and it appears that it’s the test clients that are maxed out, not the server:

Next, a look at latency of the individual responses.

Using BETWEEN, we start out with a single client at 15.5ms per request, which is not very good, but still imperceptible to a human. But with 16 clients, the latency has jumped to 60ms, which is longer than many web shops allocate to completely construct a response. As the number of test clients increases, the latency gets much worse, because the query is so dependent on CPU:

Using MBRCONTAINS, we start out with a single client at 0.333ms per request, and even with 16 clients, we are well under 1ms at 0.743ms:

Conclusion

Definitely consider using MySQL GIS whenever you need to search for a point within a set of ranges. Performance is fantastic, and it’s relatively easy to use. Even if you are an all-InnoDB shop, as most of our customers are (and we would recommend), it may very well be worth it to use MyISAM specifically for this purpose.

Update 1: Another way to do it, and a look at performance

Andy Skelton and Nikolay Bachiyski left a comment below suggesting another way this could be done:

SELECT country_code
FROM ip_country
WHERE ip_to >= INET_ATON('%s')
ORDER BY ip_to ASC
LIMIT 1

This version of the query doesn’t act exactly the same as the other two — if your search IP is not part of any range, it will return the next highest range. You will have to check whether ip_from is <= your IP within your own code. It may be possible to do this in MySQL directly, but I haven’t found a way that doesn’t kill the performance.

Andy’s version actually performs quite well — slightly faster and more scalable than MBRCONTAINS. I added two new performance testing configurations to better show the differences between the two:

Clients Machines Threads
32 4 8
64 4 16

Here’s a performance comparison of MBRCONTAINS vs. Andy’s Method:

Latency (ms) — Lower is better:

Queries per second — Higher is better:

Once I get some more time to dig into this, I will look at why exactly BETWEEN is so slow. I’ve also run into an interesting possible bug in MySQL: If you add a LIMIT 1 to the BETWEEN version of the query, performance goes completely to hell. Huh?

Thanks for the feedback, Andy and Nikolay.

Footnotes

1 MySQL provides the INET_ATON() and INET_NTOA() functions for converting back and forth between dotted-quad strings (CHAR(15)) and 32-bit integers (INT UNSIGNED). You can also use the equivalent functions, if they exist, if your favorite programming language so that you can just feed an integer to MySQL. I haven’t tested the (positive) performance implications of doing that.

2 Although, strangely they offer a different solution specifically for MySQL using <= and >= operators instead of BETWEEN. I don’t find that that difference has any effect on MySQL. Maybe it was for a really old version of MySQL that didn’t have BETWEEN?

3 Pet peeve: Why does MySQL require you to pass the output of its own POLYGON, LINESTRING, POINT, etc., functions through GEOMFROMWKB in order to use them? It makes life suck that little bit more than necessary.

4 Note that if you’re looking to play around with the BETWEEN version of things, you will want to add some indexes on ip_from and ip_to. I would recommend INDEX (ip_from, ip_to) and INDEX (ip_to, ip_from) as those two seemed to perform the best that I could find (given its poor efficiency to start with).

On Hiring a MySQL DBA/Architect

by Jeremy Cole on Monday, October 29th, 2007 at 16:55:35 in MySQL, MySQL Tips, Proven Scaling

These days everyone is looking for a MySQL DBA or MySQL Architect. I am regularly contacted by recruiters, Proven Scaling customers, and other contacts, and they all have the same question: “Where do we find MySQL people to hire?” Most of them have had requisitions open for 6+ months (I know of a few in the 12+ month range), they haven’t found anyone, and they’re feeling desperate now. Since I get this question so often, I thought I’d consolidate my advice on the subject and post it.

They don’t exist on the market today.

Currently there are many more job openings for MySQL people than there are qualified people to fill them. Many of you reading this and trying to hire someone are working for startups and are probably relatively “unknown”, perhaps you don’t have a lot to offer. This makes it even harder for you, as you must compete with the likes of Google, Facebook, and even MySQL itself. As soon as a qualified person starts looking, they are snatched up by someone. It is very unlikely that you will just happen upon a MySQL Architect with 5+ years experience etc., etc., that is on the market. Stop dreaming.

What can you do about it?

The lack of available qualified people to hire doesn’t mean you don’t have MySQL problems that need solving. As far as I’m concerned there are a few possible solutions:

  • Use consultants — Many times you can get by in the short term by using consultants to do some DBA-like tasks, and especially architect tasks. A consultant may also be able to help answer questions that a DBA would normally answer for your developers. Obviously this is somewhat self-serving, since this is the business I’m in.
  • Internal transfer — Transfer someone internally to fill the position, and train them into it. This is often the best option, if you have a large enough team. If you’re a small startup, though, you probably don’t have enough staff to make this work.
  • Hire a non-MySQL DBA — Hire someone who has a solid background in databases, but may not be a MySQL expert, and train them up on MySQL.
  • Hire a MySQL non-expert — Hire someone who is technically strong, knows some MySQL, but isn’t the expert you’re looking for, and train them into the position.

If you’re hiring someone new or transferring someone internally, you may want to consider enlisting some outside help in interviewing them to make sure they are a good fit for the position and have fairly high confidence that they will be capable of growing into the position. Proven Scaling offers interview assistance for exactly this purpose.

Okay, we’ve got someone, what now?

After you’ve hired someone from one of the above suggestions, you’ve got a warm body in a seat, but they are not a MySQL expert, so you’ll need to immediately get started on training them into the position. Here’s the basic general training plan I would suggest:

  • Books — Buy them all the books they could possibly want. I would suggest, at a minimum (depending on what you’re asking of them): MySQL, Pro MySQL, High Performance MySQL1, Understanding MySQL Internals, and Understanding the Linux Kernel. None of them are really meant to be read cover-to-cover, but they are good for understanding specific problems.
  • Training — Probably the best way to get them up to speed on a broad range of topics, would be to send them to MySQL’s formal training classes. I would recommend at least: MySQL for DBAs and MySQL 5.0 Performance Tuning. In addition, Proven Scaling can offer customized and specific training classes on certain topics, such as replication, partitioning, and scalability.
  • Consulting and/or Support — Hands-on work with a consultant is a great way to get specific questions answered and address any doubts or fears on an ongoing basis. Using a consultant for hand-holding during any potentially dangerous operations, migrations, installations, etc., is also a good way to ensure that nothing goes terribly wrong. My company, Proven Scaling, does this as well as Percona, and MySQL itself. You may also want to consider an ongoing support relationship with one of those companies as well.
  • Conferences — You shouldn’t hire for a MySQL position without planning on sending them to the MySQL Conference and Expo every year.
  • Networking — Send them to MySQL Meetups, user groups, networking events, etc. to learn from others and perhaps most importantly, learn what they are missing.
  • Give them time — It will take some time for them to get up to speed and feel comfortable in their new position. Give them plenty of time and space to learn what they need to learn. This is especially difficult with internal transfers, as they may be trying to train their replacement in their old job.

Doesn’t sound good to you? Dead-set on finding an expert?

If you’re dead-set on finding and hiring a MySQL expert, and you’re not willing to follow one of the alternate approaches I’ve suggested above, here are some tips:

  • Don’t be anonymous — People interested in and qualified for a MySQL DBA or MySQL Architect job are in the position to choose which employer they want to work for. If they don’t know who you are because you’ve posted your ad as “a hot new startup”, they will skip over you.
  • Don’t waste their time — Show them they are loved, don’t waste their time with too much unnecessary back and forth. Google their name, find them on LinkedIn, do your own research on their background, and contact them only once you’re sure that not only does their experience meet your needs, but that the job you’re asking them about has a chance of being interesting to them. No email interviews; they make you look silly. Keep the stupid questions to a minimum. If they’re from out of town, and you want an on-site interview, pay for travel upfront, and put them in a nice hotel. Engineers are inherently lazy, and reimbursement sucks.
  • Have perks — Free drinks and snacks, commute assistance, relocation, bonuses, top of the line hardware, decent office space, bike parking, showers, decent car parking are all standard perks. Make your company appealing to prospective employees.
  • Pay well — If you’re not willing to pay well, forget it. Make sure your pay scale matches what you’re asking of them. Want 24/7 pager duty? It will cost you.
  • Pay referrals well — The market for referrals is steep as well, and an external referral is going for anywhere from $5k-$10k today. Advertise prominently whatever you’re willing to pay for a referral. Pay on hire, no strings attached. Consider a referral gift on interview (iPod or similar value) regardless of hire. This ensures that “the network” remembers you’re hiring when they run into someone that’s looking.

All of the above advice works when hiring anyone, but it’s especially important when trying to hire for a position where you, as the employer, are at a disadvantage.

Good luck!

I hope this advice has been helpful. Have any more tips, advice, comments? Think I’m wrong? Please leave a comment!

1 High Performance MySQL is somewhat outdated at this point, but a lot of the advice in it is still valid. Take its advice on 5.0 with a grain of salt. I am eagerly awaiting the 2nd Edition. :)

On Falcon and the need to feel wanted

by Jeremy Cole on Thursday, September 6th, 2007 at 10:28:02 in MySQL, Rants

MySQL has a new section on their site about MySQL 6.0, which they are now calling “ready for pre-production testing”. I’m not sure when this section appeared, but I don’t spend much time on the MySQL site outside of the manual and downloads sections. Browsing around this new section I found a real gem: “Top Reasons Falcon is Cool” (or, as alternately titled on the page itself, “Top Reasons to use Falcon for Online Applications”1 … did someone forget to rename one or the other?). This page gives a top ten list2 of reasons why one should consider using Falcon, the new “not an InnoDB replacement, not at all!” but “really, you should try migrating your InnoDB application to it” storage engine.

I do think that Falcon will eventually be quite interesting, and it will hopefully have a bunch of nice tricks and whatnot, but MySQL’s marketing folks are really pushing it way too early and losing a lot of credibility in the process. Do they not realize we (as MySQL users) and a lot of others (as analysts, Oracle, DB2, and yes, even MS-SQL users) are laughing at them? I know it’s hard work to come up with 10 things sometimes, but if you get stuck at 7, make it a “Top 7″ instead of a “Top 10″. Don’t add a bunch of crap to fill it out. There are some minor chuckles in the first 7, but here’s the last 3, with my commentary:

8. Simplified Configuration

There is no complexity whatsoever in terms of configuration as only a handful of variables exist to control the behavior of the Falcon engine.

Hmm, s/bug/feature/ and you’re done! It’s so easy to configure!

9. High Availability

Extreme degrees of high availability are easily accomplished for a Falcon-driven system by using either MySQL replication or supported third-party high availability solutions such as DRBD.

This has nothing to do with Falcon, whatsoever.

10. Parallel Execution

Falcon’s design takes advantage of multi-core systems to provide parallel execution of user and service threads. Falcon uses fine-grained multi-threading to increase parallelism with locking on internal structures being done at a low level. In some cases, two threads can change different attributes of the structure at once, because the attributes are separately lockable.

Yay! Awesome! Parallel execution! It’s finally here!!!

Oh, wait, they don’t mean parallel execution of queries they mean parallel execution of internal threads. That’s nothing new, and InnoDB is already doing that. Maybe Falcon has finer-grained locking and can do this better, but that doesn’t make for a big bold title of “Parallel Execution”.

Come on, folks. Try harder.

1 Who else gets a headache from American rules for capitalization of titles?

2 How trite, yet another top ten list.