DISQUS

DISQUS Hello! 20bits is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Jump to original thread »
Author

10 Tips for Optimizing MySQL Queries (That don’t suck) | 20bits

Started by Jesse Farmer · 7 months ago

No excerpt available. Jump to website »

60 comments

  • Hey Jesse,

    I just scanned your post (it's past 2AM and I have an early flight tomorrow), but it looks like a _huge_ improvement over the Jaslabs list. In fact, it looks like the kind of post I would have written had I had more time! Great job.
  • Oh yea, one other thing... preincrement is faster than postincrement b/c postincrement requires the creation of a temporary copy of the variable to return as the result of the statement. Unless of course your compiler optimizes away the difference :).
  • On point 7, I disagree that you should change subqueries to joins. You should use subqueries where they make the SQL easier to understand, and joins when they make it easier to understand. If the two methods of writing the SQL produce the same results, then the query optimiser should optimise them to do the same thing internally anyway. If one method is slower than the other, that is a bug in the database engine.
  • Your point in #5 is all fine and dandy until you use some type of ORM framework in your application. Bring in something like Hibernate to your application and try to run a code-generation on that table. It will create it's own primary key for you in the classes it creates, which is a real pain to work with in the code.
  • WOW your response to optimizing MySQL queries is basically to set up your database right. Genius!

    What if you don't control the schema? I guess then you don't have much.
  • Hmm. If it's on duggmirror, can I still post a comment?
  • I attended a presentation on performance optimization by Jay Pipes from MySQL a couple weeks back, and his advice was to always use auto_increment primary keys unless you have a very good reason not to. I'm a little fuzzy on the details, but I believe what he said was when you use an auto_increment PK the engine inherently knows what the next available value is, while if you use another unique column for the PK it needs to do a little more work when inserting to ensure the value you're providing is unique. The end result according to him was better performance when dealing with high insert volume when using a synthetic auto_increment key. I believe he was talking about InnoDB at the time, so I'm not sure if the same concept applies to MyISAM.

    In the example you gave I agree, keep the auto_increment out of there, but in other cases where there's natural key you may actually be hurting performance.
  • Robert,

    Unfortunately that problem is hard. I personally wouldn't trust a database to reliably translate subqueries into joins where possible. I also don't think joins are frequently more obscure than subselects, especially if you're (1) familiar with SQL idioms and (2) leave comments about what data the query is fetching.
  • Bobby,

    That is true. Rails, for example, doesn't support composite primary keys out of the box. Unfortunately that is a problem since often composite primary keys are really the best solution. I know there's a mixin for Rails which adds support for composite keys, although I don't know how well it performs.
  • Andrew,

    If you don't have any control over the layout of your database or the database configuration itself then there's not much you can do. Use the profiling tools I described to isolate the expensive queries and try to rewrite them so that they're not so expensive. Since you can't alter indices or change the schema this might prove to be very difficult.
  • Nick,

    I think he was talking about using auto_increment versus incrementing via some other mechanism. Drupal, which I've written about elsewhere, does just this and it's really annoying — it has its own sequence table. This makes multiple inserts difficult if not impossible. But if you have a natural primary key then adding an auto incrementing artificial key doesn't get you anything. IOW, if you have an artificial key, it's best to make it an auto_increment integer column, in MySQL at least.
  • Actually, for #7, you'd want a LEFT JOIN to ensure you got all authors even if they haven't posted yet.
  • Ryan,

    Well, it depends on what data you're actually trying to fetch. It was meant as an illustration of the subquery vs. join issue, though. I think it suffices for that. :)
  • Robert (Jamie) Munro: You're joking, right?
  • Good list. I've attended a presentation given by Jay Pipes from MySQL, and he addresses the same points, I suspect you may have been to a similar presentation or read his book. :)

    As for the sub-queries being changed to joins, the MySQL query optimizer simply does not optimize these well, Oracle or MSSQL on the other hand convert those sub-queries to joins. The point is to realize the strengths and weaknesses of your database, and develop accordingly. Whether or not it should be done by the optimizer is a moot point when you're writing code. Since the optimizer isn't going to help you out until a later version, you should account for this if you're going to be using the DB. If you don't, the result will be extremely inefficient queries.
  • Tom,

    Good to know. The environments in which I've used Oracle and MSSQL have never approached the level of activity of the environments in which I've used MySQL (LAMP stack applications generating 30M+ pageviews a month, etc.).

    As for Jay Pipes, I've never heard of the guy. These all come from my own experience, particularly making Drupal perform well, and reading Peter Zeitsev's great weblog. Drupal basically rides right on top of the database so I've applied all these things many times over scaling Drupal applications. Have any good Pipes-related links?
  • Jay is the North American community relations manager for MySQL AB, he's also the co-author of Pro MySQL. You can find his blog at jaypipes.com and the book here http://www.amazon.com/MySQL-Experts-Voice-Open-...

    A local university was lucky enough to get him to give a nice (and free) presentation on optimizing MySQL, and I must say, it was the most informative 4 hour session I've been to, regardless of price. He's got a great understanding on how the database works, and what we as developers can and conversely shouldn't do to get the best performance out of it.

    Looks like they just did a few of these in the Midwest, and are now done... but if you get a chance to go to one of these in the future I'd highly recommend it. I've been to MS and Oracle's events like these, and while those are marketing driven, this was highly valuable.
  • Thanks, Tom! :)
  • For once I make a reply.
    Good post, and thanks for the tools in #1, I have been looking around for some and was actually seeking opinions on a few. Everyone has their opinions and different practices work better in different environments based on so many factors, so lets all be friends. Again, thank you for the input.
  • Well written and very on target, much better than the other list. I just wanted to add another point regarding normalization and I'll refer back to where you said "However, proper normalization results in a minimization of redundant data. Fundamentally that means a smaller overall footprint at the cost of performance". From a storage point of view that's a good argument but I think there is an even more important reason for normalization and that's management of redundant data. If the same data is stored in more than one place in your database, which one is currently most correct. Just something for new developers to the database world.
    --G
  • Thanks for an interesting post. Nothing much for the seasoned MySql user, I suspect, but for an application programmer like myself, who merely uses the database as a place to store data, there were a couple of eye-openers.

    Your site looks crappy in firefox btw.
  • troels,

    Hmm, what version of Firefox are you using? It looks alright in 2.0.
  • #4 is actually an example of vertical partitioning. You're moving "columns" into another table -- think of it as slicing the table vertically.

    An example of horizontal partitioning would be let's say if you have a table full of names, and you made a file group for last names A-L, and a file group for M-Z. Then, commonly, you would put each filegroup on a separate [set of] disk[s], speeding up performance.
  • Yep, you're right, jon. My mistake. :)
  • Thanks for the tips!

    I actually also found a Google Tech Talk Video on MySQL perofmance Tips and found it to be perfect. Check it out: http://blog.sherifmansour.com/?p=72 "Performance Tuning Best Practices for MySQL"
  • Sherif,

    That presentation is way more comprehensive than my list. Thanks. :)
  • "Andrew,

    If you don’t have any control over the layout of your database or the database configuration itself then there’s not much you can do. Use the profiling tools I described to isolate the expensive queries and try to rewrite them so that they’re not so expensive. Since you can’t alter indices or change the schema this might prove to be very difficult
    "

    in other words, use the tips at my site that jesse claimed were bad.

    "The rule in any situation where you want to opimize some code is that you first profile it and then find the bottlenecks. Mr. Silverton, however, aims right for the tippy top of the trees. I'd say 60% of database optimization is properly understanding SQL and the basics of databases. "

    If you had bothered to even read (or understand) my article, you would have known that it was titled "10 tips for optimizing mysql queries" IE: Specific things you can do to your queries (not the databse, engine, or software) that can help in optimization.

    Your list is very generic and can be found in almost any book about databases.
  • Justin,

    I was wondering when you'd find your way over here. I still stand by my statement that your tips suck, or at the very least aren't anything better than content created solely to generate traffic. What's more, you didn't address my critiques. So, here they are again:
    1) You are approaching the problem of performance from the wrong end
    2) Some of the problems you identify have solutions, but you give the wrong one.
    3) Some of your "tips" don't even correspond to actual problems.

    And, I'd add a fourth: since you offer no analysis it's impossible to judge the merits of your tips. Let's take the SQL_SMALL_RESULT "tip" again. What are the downsides to using it? What happens if I use SQL_SMALL_RESULT on a query that accidentally returns a large result set? How does SQL_SMALL_RESULT affect the memory usage of MySQL? Do its effects vary between MySQL engines?

    These are concerns someone seriously interested in database performance would need to know, but you give none of it. My suggestions may be more general but they are, I hope, better supported.

    Cheers.
  • This is really useful!
  • > Hmm, what version of Firefox are you using? It looks alright in 2.0.

    Using 2.0.0.3

    The div class=ch_code_container sections get a big grey block at the bottom. It happens because of the height=100%.
  • Ah ... I just realized that I was looking at a mirrored site (http://duggmirror.com/programming/10_Tips_for_O...). This probably changes the doctype, which makes the height=100% behave differently.
  • Looks like Digg needs a 'This is a Response to...' ala YouTube!
  • I normally simply use apache's ab for benchmark testing.
  • Very nice article.

    I've got a question though.. I'm making a site that uses cron jobs to update every entry in a table every 15mins.

    If my table has thousands's of rows, its gona kill my server every 15mins right?

    Is there a way to spread the load out? Maybe an UPDATE equivalent of INSERT_DELAYED?

    Thanks
    Louis
  • Louis,

    It depends on the structure of your table and what you're doing every 15 minutes. It also depends on how heavily used the database is. But "thousands" of rows is not a lot. Hundreds of thousands isn't even that bad, presuming your table isn't designed horribly.
  • Thanks for sharing!
  • I was trying to rewrite a sub-query into a join query, but it wouldn't run any faster (actually a lot slower). I'm tryign to fetch number of comments for each blog entry. It goes like this:

    SELECT title, COUNT(com.id) comments FROM entries LEFT JOIN com ON entries.id = com.entry GROUP BY entries.id ORDER BY entries.time DESC LIMIT 10

    The subquery just selected COUNT(*) from comments where id = entries.id.
  • According to Jay Pipes, your example (by splitting columns) is horizontal. Making more tables (splitting and merging) is vertical.
  • Zerd,

    I'm pretty sure horizontal partitioning is when you split according to rows and vertical is when you split by column. So, e.g., if you have an accounting system and all accounts in a given year are in their own table, that'd be horizontal partitioning. What I've described is vertical partitioning.

    As for your query, I can take a guess. I'd wager that you're using MyISAM tables, in which case count(*) is opimized away. Your indices might also be messed up. That is, if com.entry isn't an index then joining becomes more expensive. I presume entries.id and com.id are primary keys, so they're automatically indexed.

    It might even be faster to do: SELECT title, c.comments FROM entries JOIN (SELECT entry, COUNT(*) as comments FROM comments GROUP BY (comments.entry)) c ON c.entry = entries.id;
  • Thank you for sharing!
  • Very nice article Jesse. Sound, practical advice well presented and argued. The advanced topics such as benchmarking and profiling are areas I haven't really delved into yet so thanks alot for the pointers. p.s. I don't think Justin Silverton has indexed his `retort` column as his comeback query is taking a long time to run.
  • The missing thing still is how to make sql queries faster
  • Jesse:

    Rails doesn't support composite keys for model classes, but posts_tags here is just a join table, and it's fine with composite keys here -- in fact I'm pretty sure it's the default. We do precisely this in several places, and I don't remember needing to do anything special.
  • ken,

    I think when I wrote this Rails would flip out if even the join table didn't have its own auto incrementing primary key. This might have changed, but it's also possible I was wrong.
  • OK, I guess I have to give you that. We do pass :id=>false (which I think has been around basically forever) to the join table, but it's true that some Rails features flip out on :id=>false tables (even with 2.0.2). I don't think I've seen any outflipping for join tables in particular, but I wouldn't bet against it.
  • ken,

    Yeah. I don't know what version what around when I wrote this article -- but that was 11 months ago. It'd also be unfortunate if Rails flipped out by default (i.e., required :id => false), since, AFAIK, Rails is supposed to be about "convention no configuration."
  • I've read a lot of the "MySQL" Optimization guides and yours is the best so far as a single post. Most of them have gone over "Indexing" or totally skewed the guide you end up screwing yourself.

    I use parenthesis when I'm trying to do two compares on a column ie: (`date` >= 'xxxx' AND `date` <= 'yyyy'). Just make me feel comfortable with grouping it like that.

    As for the other things - I think you have given me a lot of stuff to help me in my job that I start in 2 weeks. I leave the current job tomorrow (friday), where even I know my read calls are really quick, but i can see it being at least 2x faster with these tips.
  • Hello, How write sql query for mysql, selet top 10 ?
  • I know the post is like a year old. But I think that it is the single best post on mysql optimization.
    Concise and to the point and practical. Good job.
  • These are some fantastic tips, thanks man!
    Just ordered myself a copy of the book High Performance MySQL (Arjen Lentz, Peter Zaitsev, Vadim Tkachenko), can't wait, working on a project that needs massive amounts of data output from MySQL and it seems I'll be using this new knowledge to save the company from having to buy bigger servers.
  • Hey Jesse-- I just wanted to let you know that I've been doing a fair deal of schema-ing (and scheming) at my new job and I've been using this article like a bible and sharing it with my colleagues. Thanks! ^^
  • Thanks for the list. Great perspective.
  • great tips!
  • Nice tips. I too have written some tips in not so recent past on my blog covering how to handle frequent inserts on a database maximized for reading like a blog using MyISAM.
    BTW: Your tip on eliminating artificial primary key has issues with most ORM frameworks like Hibernate.
  • Angsuman,

    Cool. I'll check it out.

    As for the ORM stuff, I know, but I don't care that much. My interests skew towards large, denormalized data storage, anyhow. Building the next great Rails app ain't my thing.
  • Thanks for your list! Also, I was so free to do my own post on some basic and advanced mysql tuning tips!
  • Great article. I'm trying to take my MySql skills to the next level and this is the kind of stuff I want to learn.
  • good post. One way I find make a big difference is using group by rather than distinct when it's possible
  • I liked a lot your post but wish to as one question, about a problem I need to solve.
    I am facing a situation wher I have to query a database for the countries names, then, when a client select a country, I show the states (regions) of this country and, finally, when he selects one of these regions, I need to show all the cities of this region.
    I ask:
    What should I use:
    one table for the countries with countries names and countries codes;
    one table for the regions with regions names and codes, for each country;
    one table for the cities for that country.
    or, it's best to have only one table with all that data and perform all queries in that table (more than 2,000,000 records)?

    Thanks
  • Cool, Thanks!

Add New Comment

Returning? Login