On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS
Geo-referencing IPs is, in a nutshell, converting an IP address, perhaps from an incoming web visitor, a log file, a data file, or some other place, into the name of some entity owning that IP address. There are a lot of reasons you may want to geo-reference IP addresses to country, city, etc., such as in simple ad targeting systems, geographic load balancing, web analytics, and many more applications.
This is a very common task, but I have never actually seen it done efficiently in MySQL in the wild. There is a lot of questionable advice on forums, blogs, and other sites out there on this topic. After working with a Proven Scaling customer, I recently did some thinking and some performance testing on this problem, so I thought I would publish some hard data and advice for everyone.
Unfortunately, R-tree (spatial) indexes have not been added to InnoDB yet, so the tricks in this entry only work efficiently with MyISAM tables (although they should work with InnoDB, they will perform poorly). This is actually OK for the most part, as the geo-referencing functionality most people need doesn’t really need transactional support, and since the data tables are basically read-only (monthly replacements are published), the likelyhood of corruption in MyISAM due to any server failures isn’t very high.
The data provided by MaxMind
MaxMind is a great company that produces several geo-referencing databases. They release both a commercial (for-pay, but affordable) product called GeoIP, and a free version of the same databases, called GeoLite. The most popular of their databases that I’ve seen used is GeoLite Country. This allows you look up nearly any IP and find out which country (hopefully) its user resides in. The free GeoLite versions are normally good enough, at about 98% accurate, but the for-pay GeoIP versions in theory are more accurate. In this article I will refer to both GeoIP and GeoLite as “GeoIP” for simplicity.
GeoIP Country is available as a CSV file containing the following fields:
- ip from, ip to (text) — The start and end IP addresses as text in dotted-quad human readable format, e.g. “3.0.0.0″. This is a handy way for a human to read an IP address, but a very inefficient way for a computer to store and handle IP addresses.
- ip from, ip to (integer) — The same start and end IP addresses as 32-bit integers1, e.g. 50331648.
- country code — The 2-letter ISO country code for the country to which this IP address has been assigned, or in some cases other strings, such as “A2″ meaning “Satellite Provider”.
- country name — The full country name of the same. This is redundant with the country code if you have a lookup table of country codes (including MaxMind’s non-ISO codes), or if you make one from the GeoIP data.
A simple way to search for an IP
Once the data has been loaded into MySQL (which will be explained in depth later), there will be a have a table with a range (a lower and upper bound), and some metadata about that range. For example, one row from the GeoIP data (without the redundant columns) looks like:
| ip_from | ip_to | country_code |
| 50331648 | 68257567 | US |
The natural thing that would come to mind (and in fact the solution offered by MaxMind themselves2) is BETWEEN. A simple query to search for the IP 4.2.2.1 would be:
SELECT country_code
FROM ip_country
WHERE INET_ATON("4.2.2.1") BETWEEN ip_from AND ip_to
Unfortunately, while simple and natural, this construct is extremely inefficient, and can’t effectively use indexes (although it can use them, it isn’t efficient). The reason for this is that it’s an open-ended range, and it is impossible to close the range by adding anything to the query. In fact I haven’t been able to meaningfully improve on the performance at all.
A much better solution
While it probably isn’t the first thing that would come to mind, MySQL’s GIS support is actually perfect for this task. Geo-referencing an IP address to a country boils down to “find which range or ranges this item belongs to”, and this can be done quite efficiently using spatial R-tree indexes in MySQL’s GIS implementation.
The way this works is that each IP range of (ip_from, ip_to) is represented as a rectangular polygon from (ip_from, -1) to (ip_to, +1) as illustrated here:

In SQL/GIS terms, each IP range is represented by a 5-point rectangular POLYGON like this one, representing the IP range of 3.0.0.0 – 4.17.135.31:
POLYGON(( 50331648 -1, 68257567 -1, 68257567 1, 50331648 1, 50331648 -1 ))
The search IP address can be represented as a point of (ip, 0), and that point with have a relationship with at least one of the polygons (provided it’s a valid IP and part of the GeoIP database) as illustrated here:

It is then possible to search these polygons for a specific point representing an IP address using the GIS spatial relationship function MBRCONTAINS and POINT3 to search for “which polygon contains this point” like this:
SELECT country_code
FROM ip_country
WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON('4.2.2.1'), 0)))
Pretty cool huh? I will show how to load the data and get started, then take look at how it performs in the real world, and compare the raw numbers between the two methods.
Loading the data and preparing for work
First, a table must be created to hold the data. A POLYGON field will be used to store the IP range. Technically, at this point the ip_from and ip_to fields are unnecessary, but given the complexity of extracting the IPs from the POLYGON field using MySQL functions, they will be kept anyway. This schema can be used to hold the data4:
CREATE TABLE ip_country ( id INT UNSIGNED NOT NULL auto_increment, ip_poly POLYGON NOT NULL, ip_from INT UNSIGNED NOT NULL, ip_to INT UNSIGNED NOT NULL, country_code CHAR(2) NOT NULL, PRIMARY KEY (id), SPATIAL INDEX (ip_poly) );
After the table has been created, the GeoIP data must be loaded into it from the CSV file, GeoIPCountryWhois.csv, downloaded from MaxMind. The LOAD DATA command can be used to do this like so:
LOAD DATA LOCAL INFILE "GeoIPCountryWhois.csv"
INTO TABLE ip_country
FIELDS
TERMINATED BY ","
ENCLOSED BY "\""
LINES
TERMINATED BY "\\n"
(
@ip_from_string, @ip_to_string,
@ip_from, @ip_to,
@country_code, @country_string
)
SET
id := NULL,
ip_from := @ip_from,
ip_to := @ip_to,
ip_poly := GEOMFROMWKB(POLYGON(LINESTRING(
/* clockwise, 4 points and back to 0 */
POINT(@ip_from, -1), /* 0, top left */
POINT(@ip_to, -1), /* 1, top right */
POINT(@ip_to, 1), /* 2, bottom right */
POINT(@ip_from, 1), /* 3, bottom left */
POINT(@ip_from, -1) /* 0, back to start */
))),
country_code := @country_code
;
During the load process, the ip_from_string, ip_to_string, and country_string fields are thrown away, as they are redundant. A few GIS functions are used to build the POLYGON for ip_poly from the ip_from and ip_to fields on-the-fly. On my test machine it takes about 5 seconds to load the 96,641 rows in this month’s CSV file.
At this point the data is loaded, and everything is ready to go to use the above SQL query to search for IPs. Try a few out to see if they seem to make sense!
Performance: The test setup
In order to really test things, a bigger load testing framework will be needed, as well as a few machines to generate load. In my tests, the machine being tested, kamet, is a Dell PowerEdge 2950 with Dual Dual Core Xeon 5050 @ 3.00Ghz, and 4GB RAM. We have four test clients, makalu{0-3}, which are Apple Mac Mini with 1.66Ghz Intel CPUs and 512MB RAM. The machines are all connected with a Netgear JGS524NA 24-port GigE switch. For the purposes of this test, the disk configuration is not important. On the software side, the server is running CentOS 4.5 with kernel 2.6.9-55.0.2.ELsmp. The Grinder 3.0b32 is used as a load generation tool with a custom Jython script and Connector/J 5.1.5 to connect to MySQL 5.0.45.
There are a few interesting metrics that I tested for:
- The latency and queries per second with a single client repeatedly querying.
- Does the number of queries handled increase as the number of clients increases?
- Is latency and overall performance adversely affected by many clients?
The test consisted of an IP search using the two different methods, and varying the number of clients between 1 and 16 in the following configurations:
| Clients | Machines | Threads |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 4 |
| 8 | 2 | 4 |
| 16 | 4 | 4 |
Each test finds the country code for a random dotted-quad format IP address passed in as a string.
How does it perform? How does it compare?
There are a few metrics for determining the performance of these searches. If you tried the BETWEEN version of this query, you may have noticed that, in terms of human time, it doesn’t take very long anyway: I pretty consistently got 1 row in set (0.00 sec). But don’t let that fool you.
It’s clear that GIS wins hands down.
First, a look at raw performance in terms of queries per second.
Using BETWEEN, we max out at 264q/s with 16 clients:

Using MBRCONTAINS, we max out at 17600q/s with 16 clients, and it appears that it’s the test clients that are maxed out, not the server:

Next, a look at latency of the individual responses.
Using BETWEEN, we start out with a single client at 15.5ms per request, which is not very good, but still imperceptible to a human. But with 16 clients, the latency has jumped to 60ms, which is longer than many web shops allocate to completely construct a response. As the number of test clients increases, the latency gets much worse, because the query is so dependent on CPU:

Using MBRCONTAINS, we start out with a single client at 0.333ms per request, and even with 16 clients, we are well under 1ms at 0.743ms:

Conclusion
Definitely consider using MySQL GIS whenever you need to search for a point within a set of ranges. Performance is fantastic, and it’s relatively easy to use. Even if you are an all-InnoDB shop, as most of our customers are (and we would recommend), it may very well be worth it to use MyISAM specifically for this purpose.
Update 1: Another way to do it, and a look at performance
Andy Skelton and Nikolay Bachiyski left a comment below suggesting another way this could be done:
SELECT country_code
FROM ip_country
WHERE ip_to >= INET_ATON('%s')
ORDER BY ip_to ASC
LIMIT 1
This version of the query doesn’t act exactly the same as the other two — if your search IP is not part of any range, it will return the next highest range. You will have to check whether ip_from is <= your IP within your own code. It may be possible to do this in MySQL directly, but I haven’t found a way that doesn’t kill the performance.
Andy’s version actually performs quite well — slightly faster and more scalable than MBRCONTAINS. I added two new performance testing configurations to better show the differences between the two:
| Clients | Machines | Threads |
| 32 | 4 | 8 |
| 64 | 4 | 16 |
Here’s a performance comparison of MBRCONTAINS vs. Andy’s Method:
Latency (ms) — Lower is better:

Queries per second — Higher is better:

Once I get some more time to dig into this, I will look at why exactly BETWEEN is so slow. I’ve also run into an interesting possible bug in MySQL: If you add a LIMIT 1 to the BETWEEN version of the query, performance goes completely to hell. Huh?
Thanks for the feedback, Andy and Nikolay.
Footnotes
1 MySQL provides the INET_ATON() and INET_NTOA() functions for converting back and forth between dotted-quad strings (CHAR(15)) and 32-bit integers (INT UNSIGNED). You can also use the equivalent functions, if they exist, if your favorite programming language so that you can just feed an integer to MySQL. I haven’t tested the (positive) performance implications of doing that.
2 Although, strangely they offer a different solution specifically for MySQL using <= and >= operators instead of BETWEEN. I don’t find that that difference has any effect on MySQL. Maybe it was for a really old version of MySQL that didn’t have BETWEEN?
3 Pet peeve: Why does MySQL require you to pass the output of its own POLYGON, LINESTRING, POINT, etc., functions through GEOMFROMWKB in order to use them? It makes life suck that little bit more than necessary.
4 Note that if you’re looking to play around with the BETWEEN version of things, you will want to add some indexes on ip_from and ip_to. I would recommend INDEX (ip_from, ip_to) and INDEX (ip_to, ip_from) as those two seemed to perform the best that I could find (given its poor efficiency to start with).
November 24th, 2007 at 09:32:48
[...] On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS [...]
November 24th, 2007 at 10:54:30
In fact another way of doing it is to use a conventional index on ip_from and then just do:
SELECT * FROM ipcountry WHERE ipfrom
November 24th, 2007 at 11:30:10
Hi Jeremy,
Thats a well research and written post that has tought me a lot about speed and efficiency regarding GeoIP lookups in MySQL and will inspire me to further create documented and researched material for educational purposes.
I have written an article about how to install the Maxmind GeoIP Country database you can read at Maxmind GeoIP setup tutorial using phpMyAdmin.
Kind Regards
Bart
November 24th, 2007 at 12:53:14
Hi Mark,
I’m not sure I follow. What do you mean? Do keep in mind we’re looking for an IP (x) within a range (m - n). Neither m nor n is necessarily x itself.
Regards,
Jeremy
November 24th, 2007 at 15:26:48
While trying to improve the
BETWEENquery performance, a colleague of mine — Andy Skelton — devised a very simple and fast query:Of course this tricks works only if we have contiguous intervals, which cover all the IPs. Luckily most of the geoip databases conform to this rule.
My simple tests showed that Andy’s query is slightly faster than yours, but you may feed it into your benchmarks, so that we can see if there’s any substantial difference.
November 24th, 2007 at 15:39:32
I don’t know the first thing about GIS but I have found an efficient solution and put it to use “in the wild”—on WordPress.com.
Possibly similar to the solution from Mark, whose comment I assume was cut off due to an unescaped HTML entity, we typically get results from a 4,900,000-row MyISAM table in under 0.5ms using this query:
SELECT * FROM ip2location WHERE 123456789 <= ip_to LIMIT 1All we had to do was add an index on ip_to and verify that there are no overlapping ranges in our table. We can quickly compare against ip_from in software so there is no need for a double-range query.
Until I discovered this simple query, our ip2location table was useless in production. Now we use it all the time. It doesn’t even need its own server; it coexists on a server with dozens of other busy tables.
November 24th, 2007 at 18:19:38
Hi Andy,
This is pretty interesting. I’ll be updating the post shortly with your additions and the performance of that version of the query.
Regards,
Jeremy
November 25th, 2007 at 04:31:27
Andy Skelton said:
November 25th, 2007 at 08:38:43
My post originally said
SELECT * FROM ipcountry WHERE ipfrom < 123456789 LIMIT 1
But yes, it seems to have been cut off.
My experiments suggested that this was very good. It wouldn’t work with overlapping ranges, but nor would any other algorithm. It would be straightforward to check that there were no overlapping ranges.
You would have to check the row returned to make sure the IP was really within it (this check is simple and efficient). If the IP wasn’t in the range of the returned row however, you could be sure it was nowhere else either (i.e. not found)
Mark
November 25th, 2007 at 08:50:55
Further correction:
Should have said WHERE ipfrom <= 123456789 ORDER BY ipfrom DESC LIMIT 1
The “ORDER BY ipfrom DESC” is essential. You’re finding the highest ipfrom which is less than or equal to the one you’re checking. This means you will find the range which the IP is in, if any, in a simple query using a conventional index.
Mark
November 25th, 2007 at 11:28:48
The index on ip_to lets us get away without any ORDER BY because the index is naturally scanned in ascending order. That’s also why I didn’t use ip_from.
November 25th, 2007 at 11:34:42
Hi Andy,
You could actually do the same trick with either ip_from or ip_to, as MySQL is capable of scanning in either direction. I wouldn’t leave the ORDER BY out; as long as MySQL does what you expect (and it should) the ORDER BY is a no-op. If an optimizer change is made at some point, the entire intent of your query can change without the ORDER BY, and that would be Bad(tm) — you would start getting random rows back (but still 1) instead of the one you want.
At least for me, getting back the wrong row “sometimes” is bad enough, the possibility of the behaviour changing on upgrade and completely breaking doesn’t make me comfortable at all.
Regards,
Jeremy
November 25th, 2007 at 11:43:51
Nikolay’s version with ORDER BY is what we use in production because we haven’t observed any difference in performance when adding ORDER BY, nor accuracy, nor EXPLAIN SELECT. It is as you said. Better safe.
November 25th, 2007 at 14:34:10
Jeremy,
I forgot to mention that the GeoLite Country database actually contains lots of gaps. In order to make sure the query gives the correct result I wrote a simple perl script, which takes the csv file and fills in the gaps with dummy rows. Now if the entry doesn’t exists you will get “-” instead of the next IP’s country.
November 26th, 2007 at 23:17:57
[...] jcole’s weblog » On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS [...]
November 28th, 2007 at 12:49:50
[...] On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS (0 visite) [...]
November 30th, 2007 at 10:05:45
Forgive me if this is a stupid question, but how does = perform in comparison to BETWEEN?
SELECT country_codeFROM ip_country
WHERE ip_to >= INET_ATON(’4.2.2.1′) AND ip_from
I'm not a MySQL expert by any means, but could the performance gain be the
LIMIT 1? Perhaps after finding the first result, MySQL can stop scanning the rest of the table. If that is the case, I’m guessing the >= ANDNovember 30th, 2007 at 10:19:39
¿have you tried the binary file version that maxmind provides? It has a optimized format for this especific search, and libraries to open it in varius languages (java, .net, ruby and more.)
November 30th, 2007 at 12:57:24
Joaquim,
No, I haven’t done any performance testing on MaxMind’s solution yet. It would be interesting to do so.
Regards,
Jeremy
November 30th, 2007 at 21:17:27
[...] On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS (tags: blog geography network gis geoip performance database) [...]
November 30th, 2007 at 22:27:15
I had to insert a slash into your SQL to make it work:
LINES
TERMINATED BY “\n”
Or it errored out with “Cannot get geometry object” etc. Worked fine after that change.
Thanks for the great article.
December 1st, 2007 at 00:55:45
[...] these instructions to get MaxMind’s GeoLite IPCountry table into your [...]
December 1st, 2007 at 01:54:36
Hi hk,
Ack, you’re quite right. I guess WordPress escaped/changed it somewhere, as it was “\n” in the edit box. Changing it to “\\n” corrected it in display. Thanks for the note! Nobody else caught that.
Regards,
Jeremy
December 1st, 2007 at 03:20:32
jcole?s weblog: Jeremy Cole?s take on life. » Blog Archive » On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS…
[...][...]…
December 1st, 2007 at 04:03:31
Seconded on Joaquim’s hint about using the binary file version of MaxMind’s product - it’s quite compact and really the right way to do things if you’ve got a very large number of queries to run — and this is important — at page view time, when latency as opposed to aggregate throughput is a factor. When post-processing for stats you should of course use whatever is closest to your input dataset. We use the binary file api in php at PBwiki for a number of things, and page content can vary based on nation of origin. There are many use cases where an RDBMS shines and this isn’t really one of them.
December 1st, 2007 at 20:33:04
This should do the in-range check in the query cheaply (but I haven’t tested at all):
select * from (SELECT * FROM ip2location WHERE 123456789 = ip_from;
December 1st, 2007 at 20:51:13
Joshua,
Did your post get truncated? I don’t think I get it.
Regards,
Jeremy
December 3rd, 2007 at 15:21:00
[...] jcole’s weblog: Jeremy Cole’s take on life. » Blog Archive » On efficiently geo-referencing IP… (tags: mysql geoip database) [...]
December 4th, 2007 at 18:37:39
[...] On efficiently geo-referencing IPs with MaxMind GeoIP and MySQL GIS - [...]
December 16th, 2007 at 03:48:08
[...] MySQL Range Queries on MaxMind GeoIP Tables A few weeks ago I read Jeremy Cole’s post on querying MaxMind GeoIP tables but I didn’t know what all that geometric magic was about so I dropped a comment about how we [...]
January 9th, 2008 at 06:01:53
We are using IP2Location instead of Maxmind.
You can find similar SQL syntax from their FAQs.
http://www.ip2location.com/faqs-ip-country.aspx
January 10th, 2008 at 12:54:44
So I tried the original example out of curiosity but I found many issues with accuracy and such. So I looked at it and changed it around a bit and this is what I came up with.
Import like this:
TRUNCATE ip_country;
LOAD DATA LOCAL INFILE “GeoIPCity.csv”
INTO TABLE ip_country
FIELDS
TERMINATED BY “,”
ENCLOSED BY “\”"
LINES
TERMINATED BY “\n”
IGNORE 1 LINES
(
@startIpNum,@endIpNum,@country,@region,
@city,@postalCode,@latitude,@longitude,
@dmaCode,@areaCode
)
SET
id := NULL,
ip_from := INET_ATON(@startIpNum),
ip_to := INET_ATON(@endIpNum),
ip_poly := GEOMFROMWKB(POLYGON(LINESTRING(
/* clockwise, 4 points and back to 0 */
POINT(INET_ATON(@startIpNum), -1), /* 0, top left */
POINT(INET_ATON(@endIpNum), -1), /* 1, top right */
POINT(INET_ATON(@endIpNum), 1), /* 2, bottom right */
POINT(INET_ATON(@startIpNum), 1), /* 3, bottom left */
POINT(INET_ATON(@startIpNum), -1) /* 0, back to start */
))),
country_code := @country,
city := @city,
zipcode := @postalCode,
lat := @latitude,
lon := @longitude,
areacode := @areaCode
;
Then use it like this:
SELECT city,country_code FROM ip_country WHERE MBRCONTAINS(ip_poly, POINTFROMWKB(POINT(INET_ATON(’12.199.160.34′), 0)));
January 10th, 2008 at 12:58:12
I forgot to mention, that while this was rather fast assuming you already have a connection pool or something similar, when you get to an installation like ours (1500+ servers) the time taken to create a MySQL connection, run the query and return the recordset turned out to be longer than the time to use the binary file locally.
I had figured it would be close to on par if not faster. But because of the way Linux caches files on local file systems it’s faster to use the binary file for us.
February 6th, 2008 at 02:13:32
Another *very* fast solution is to use a hash based on the class B network of the IP address your looking for.
First add a hash field ‘bucket’ and create a key for it:
ALTER TABLE ip_country ADD bucket smallint unsigned NOT NULL AFTER ip_to;
ALTER TABLE ip_country ADD KEY bucket(bucket);
Then fill the hash (this took 90 seconds on my development machine):
UPDATE ip_country SET bucket=ip_from >>16;
This sets the new field to the first 16 bits of the IP address of the ip_from field.
And from now on add the following to the WHERE clause of every query. Here $ip is the dotted notation of the IP address your are looking for.
… AND bucket=INET_ATON($ip)>>16
Example:
SELECT * FROM ip_country
WHERE bucket=INET_ATON(’72.14.207.99′)>>16
AND INET_ATON(’72.14.207.99′) BETWEEN ip_from AND ip_to;
Adding the extra condition brought down the time from 3-5 seconds to 0.0
Would be great if you could run the performance tests on this one too, so we can see some stats on this one.
- Stan
PS: The geo data (lat / long) in these databases is *really* bad.
February 6th, 2008 at 15:40:49
Converting IP Addresses to Integer Database Fields: Endian Match
=====================================================
There is a SQL/GIS POLYGON example, representing the IP range of [3.0.0.0; 4.17.135.31]. The unsigned integer versions are 50331648 & 68257567 respectively on an Intel architecture. This is confusing, since the “network” architecture is the opposite.
In order to store IP addresses in INTEGER database fields on Intel (x86), you *must* get the endianness right. See inet_aton(), ntohl(), etc.
February 7th, 2008 at 12:38:39
Why does this use a POLYGON (rectangle) rather than a LINESTRING? An IP Address range is a line-segment.
I’d expect a 1D point on a line-segment to be an easier calculation than an MBR around a 2D point. Is the MBR/R-Tree solution relly any faster than a GIS LINESTRING?
February 11th, 2008 at 16:06:34
Nothing to add, just want to say how much I appreciate this thread!! I am analyzing large volumes of data on people *trying* to access our site, but getting turned away b/c no access. We want to profile them geographically, and my big, fat Sun MySQL server was dragging.
After I converted into polygonal data, I was able to process, using only a dual-core MacBook Pro, almost 6 million IP addresses to identify country of origin in 5 minutes!!
Thanks y’all!
February 26th, 2008 at 14:15:36
This is a very helpful thread. We are using a database from IP2Location (www.ip2location.com). Some of the approaches discussed in this thread assume that the IP ranges in the database are not overlapping. Has anyone confirmed that this is actually the case (either for IP2Location or any other IP database)? Thanks.
February 28th, 2008 at 11:07:40
Well, I have tried some of the suggestions in this tread but the results were not what I expected. Here are my results:
QUERY 1
This was my initial query, which I am trying to optimize.
mysql> select SQL_NO_CACHE * from IP_TO_LOCATION where 1234567890 between IP_FROM and IP_TO LIMIT 1;
1 row in set (1.62 sec)
________________
QUERY 2
This is my first attempt at optimizing the query. However, this did not work so well… see the results below. Can anyone explain why this is the case? Based on this thread, I would not have expected the ORDER BY clause to be a performance hit.
mysql> select SQL_NO_CACHE * from IP_TO_LOCATION where IP_TO >= 1234567890 order by IP_TO asc LIMIT 1;
1 row in set (3.84 sec)
________________
QUERY 3
Removing the ORDER BY improved performance. Now the performance is slightly better then the initial query…but still not as fast as I was expecting. Again, can anyone help explain what is happening here and why I a not seeing they type of performance suggested by the comments on this thread?
mysql> select SQL_NO_CACHE * from IP_TO_LOCATION where IP_TO >= 1234567890 LIMIT 1;
1 row in set (1.52 sec)
Note: In the example queries above, the 1234567890 is a placeholder for the actual IP address used in these queries.
March 11th, 2008 at 02:51:50
Useful thread indeed. I got lots of valued information from it.
Similar to Andrew McLetchie, our company wanted to profile people visiting our website and I was the guy to make a solution for it. After some research I found this blog and it gave me a few ideas.
I ended up using the free data from MaxMind (they update it monthly), but I used all their free products (country, city, region and timezone) instead of just their Country or City data. When doing that there were of course duplicates, but other then that no major problems.
I combined it all into a single database giving me access to not only look-up IPs but also to reverse look-up based on one or more of the information stored.
Data is found usually within 0.1sec.
Thus we can find users who are in a certain timezone, country, city or region. Pretty neat.
When MaxMind updates their data I only have to download the new csv files, load up my installation script and about 15-30mins later (when run on my laptop) the database is updated.
I made it so easy to install, that its ready to be published for any to use (sorry, not my call). Written in PHP5 to MySQL5.
March 17th, 2008 at 10:57:40
I’ve actually used this technique for a couple years now after realizing not only does it speed up the search but reduces the db table to just over 1mb which can be cached far better (and dropping the unneeded columns)
However I discovered it’s a little more accurate and gives better “missing” results if you do it backwards using the ENDING column and descend - searching backwards essentially. MySQL does it just as fast, and if not found, the next lower result is better.
I’d had to manually patch the maxmind ranges about two dozen times now. The free db has several holes and inaccuracies, especially with ISPs like AOL. It also lists EU for several spots that should be more country specific. We should group together to share the patches for holes.
April 15th, 2008 at 09:21:16
@ Dave K.
Query 2: You need an index(key) on field IP_TO
April 23rd, 2008 at 14:08:42
Thank you for the great thread. We have a GPLed tool you can use to make plots of GeoIP data called Entrance. We were at the MySQL Conference and I was wowed by the world map Jonathan Schwartz used to show MySQL and Solaris downloads in his keynote. So I came home and added something similar to Entrance.
Its based on world map images by David Pape (which probably what the Sun guys used). Once you have lat, long calculated you do this:
PLOT EarthChart
x, very small filled yellow circle
WITH
gray gridlines
SELECT lon, lat
from ACCESS_IPS;
The details are on my page: http://todlandis.com/ and the Entrance downloads are on http://dbentrance.com/ To get EarthCharts you’ll need version 1.2.70 or greater, and either the GPL or IDE version.
To get a black background do this:
PLOT EarthChart
x, very small filled yellow circle
WITH
gray gridlines
no bitmap
background black
SELECT lon, lat
FROM ACCESS_IPS;
… then you can flip between them with Window | Go back… Window | Go forward.
The black background is pretty handy to have.
May 6th, 2008 at 03:04:21
Previou post was truncated, seemc to be the < problem…
SELECT SQL_NO_CACHE country_code FROM ip_country WHERE INET_ATON(”4.2.2.1″)>=ip_from AND INET_ATON(”4.2.2.2″)<=ip_to;
with primary key on (ip_from, ip_to).
It gives the same result (queryTime = 0.0003 s) for me as Andy’s method and it wont’t give you incorrect results when ip not in range.
It’s also much faster than BETWEEN condition.
August 18th, 2008 at 12:24:31
Curious to know if anyone has tried these out with myisampack’ed tables. while the country database isn’t too big, I wonder what affect this has with the city database - quite a bit larger and the overlapping numbers again.
On another note (slightly off-topic): have any of you guys tried out the apache mod_geoip API way of doing things? Just installed it (pretty painless) and it works pretty darn quickly. Haven’t done any serious load testing with it, but with that API its pretty easy and PHP gives you variables through $_SERVER you can use (heck just about any language you use with Apache)… Food for thought to save a little pain and database wear and tear. No 30 min. data build, etc, etc, etc. Once installed (took me about 5 minutes, maybe less) you just have to download once a month their compressed data file and (perhaps) restart apache. There may be a better way to update it than that, and it surely can be automated. Food for thought….
http://www.maxmind.com/app/mod_geoip
October 1st, 2008 at 10:34:43
It gives the same result (queryTime = 0.0003 s) for me as Andy’s method and it wont’t give you incorrect results when ip not in range.
Per query it may be faster but when doing an updte on a table with 977k rows, this took 25seconds. Using BETWEEN ran for over 12 hours before I stopped it, I let the primary key method run for several minutes. Spatial Indexing FTW
October 16th, 2008 at 03:38:22
[...] format and querying is not very efficient at all, and there are two excellent posts (here and here) which I intend to one day impliment myself, but for now the above method works fine for low to [...]
December 8th, 2008 at 19:08:55
I’ve tried to expand on Andy’s method above with regards to having to check the ip_from at the software level.
I haven’t exactly benchmarked this yet, but it seems as though it would not hurt performance. Basically I needed a reliable way to do the test for ip_from <= result_row to determine if the IP was in the range that was returned. With PHP being loosely typed and lack of support for unsigned int’s I tried to return a Boolean value from MySQL to check for instead of doing a mildly more complex comparison at the software level.
[CODE]
SELECT country_code, ip_from, IF(INET_ATON(%s)>=ip_from,1,0) AS bool_inrange
FROM ip_country
WHERE ip_to >= INET_ATON(’%s’)
ORDER BY ip_to ASC
LIMIT 1;
[/CODE]
This allows me to simply check if ($result_array['bool_inrange']) { … }
instead of having to worry about the signed/unsigned oddities; additionally, it ensures all LONG numbers are in correct endianness by using MySQL’s INET_ATON.
In short, you still need to check at the software level against the ‘bool_inrange’, but this should make it easier for some languages and require less code to do those checks at the software/app level.
[This is a correction post due to the blog assuming a < and > combination in a post is an HTML element, heh.]