DISQUS

20bits: 10 Tips for Optimizing MySQL Queries (That don’t suck) | 20bits

  • Mike · 2 years ago
    Hey Jesse,

    I just scanned your post (it's past 2AM and I have an early flight tomorrow), but it looks like a _huge_ improvement over the Jaslabs list. In fact, it looks like the kind of post I would have written had I had more time! Great job.
  • Mike · 2 years ago
    Oh yea, one other thing... preincrement is faster than postincrement b/c postincrement requires the creation of a temporary copy of the variable to return as the result of the statement. Unless of course your compiler optimizes away the difference :).
  • Robert (Jamie) Munro · 2 years ago
    On point 7, I disagree that you should change subqueries to joins. You should use subqueries where they make the SQL easier to understand, and joins when they make it easier to understand. If the two methods of writing the SQL produce the same results, then the query optimiser should optimise them to do the same thing internally anyway. If one method is slower than the other, that is a bug in the database engine.
  • Bobby Voliva · 2 years ago
    Your point in #5 is all fine and dandy until you use some type of ORM framework in your application. Bring in something like Hibernate to your application and try to run a code-generation on that table. It will create it's own primary key for you in the classes it creates, which is a real pain to work with in the code.
  • Andrew · 2 years ago
    WOW your response to optimizing MySQL queries is basically to set up your database right. Genius!

    What if you don't control the schema? I guess then you don't have much.
  • Yitz · 2 years ago
    Hmm. If it's on duggmirror, can I still post a comment?
  • Nick · 2 years ago
    I attended a presentation on performance optimization by Jay Pipes from MySQL a couple weeks back, and his advice was to always use auto_increment primary keys unless you have a very good reason not to. I'm a little fuzzy on the details, but I believe what he said was when you use an auto_increment PK the engine inherently knows what the next available value is, while if you use another unique column for the PK it needs to do a little more work when inserting to ensure the value you're providing is unique. The end result according to him was better performance when dealing with high insert volume when using a synthetic auto_increment key. I believe he was talking about InnoDB at the time, so I'm not sure if the same concept applies to MyISAM.

    In the example you gave I agree, keep the auto_increment out of there, but in other cases where there's natural key you may actually be hurting performance.
  • Jesse · 2 years ago
    Robert,

    Unfortunately that problem is hard. I personally wouldn't trust a database to reliably translate subqueries into joins where possible. I also don't think joins are frequently more obscure than subselects, especially if you're (1) familiar with SQL idioms and (2) leave comments about what data the query is fetching.
  • Jesse · 2 years ago
    Bobby,

    That is true. Rails, for example, doesn't support composite primary keys out of the box. Unfortunately that is a problem since often composite primary keys are really the best solution. I know there's a mixin for Rails which adds support for composite keys, although I don't know how well it performs.
  • Jesse · 2 years ago
    Andrew,

    If you don't have any control over the layout of your database or the database configuration itself then there's not much you can do. Use the profiling tools I described to isolate the expensive queries and try to rewrite them so that they're not so expensive. Since you can't alter indices or change the schema this might prove to be very difficult.
  • Jesse · 2 years ago
    Nick,

    I think he was talking about using auto_increment versus incrementing via some other mechanism. Drupal, which I've written about elsewhere, does just this and it's really annoying — it has its own sequence table. This makes multiple inserts difficult if not impossible. But if you have a natural primary key then adding an auto incrementing artificial key doesn't get you anything. IOW, if you have an artificial key, it's best to make it an auto_increment integer column, in MySQL at least.
  • Ryan · 2 years ago
    Actually, for #7, you'd want a LEFT JOIN to ensure you got all authors even if they haven't posted yet.
  • Jesse · 2 years ago
    Ryan,

    Well, it depends on what data you're actually trying to fetch. It was meant as an illustration of the subquery vs. join issue, though. I think it suffices for that. :)
  • myshpa · 2 years ago
    Robert (Jamie) Munro: You're joking, right?
  • Tom · 2 years ago
    Good list. I've attended a presentation given by Jay Pipes from MySQL, and he addresses the same points, I suspect you may have been to a similar presentation or read his book. :)

    As for the sub-queries being changed to joins, the MySQL query optimizer simply does not optimize these well, Oracle or MSSQL on the other hand convert those sub-queries to joins. The point is to realize the strengths and weaknesses of your database, and develop accordingly. Whether or not it should be done by the optimizer is a moot point when you're writing code. Since the optimizer isn't going to help you out until a later version, you should account for this if you're going to be using the DB. If you don't, the result will be extremely inefficient queries.
  • Jesse · 2 years ago
    Tom,

    Good to know. The environments in which I've used Oracle and MSSQL have never approached the level of activity of the environments in which I've used MySQL (LAMP stack applications generating 30M+ pageviews a month, etc.).

    As for Jay Pipes, I've never heard of the guy. These all come from my own experience, particularly making Drupal perform well, and reading Peter Zeitsev's great weblog. Drupal basically rides right on top of the database so I've applied all these things many times over scaling Drupal applications. Have any good Pipes-related links?
  • Tom · 2 years ago
    Jay is the North American community relations manager for MySQL AB, he's also the co-author of Pro MySQL. You can find his blog at jaypipes.com and the book here http://www.amazon.com/MySQL-Experts-Voice-Open-...

    A local university was lucky enough to get him to give a nice (and free) presentation on optimizing MySQL, and I must say, it was the most informative 4 hour session I've been to, regardless of price. He's got a great understanding on how the database works, and what we as developers can and conversely shouldn't do to get the best performance out of it.

    Looks like they just did a few of these in the Midwest, and are now done... but if you get a chance to go to one of these in the future I'd highly recommend it. I've been to MS and Oracle's events like these, and while those are marketing driven, this was highly valuable.
  • Jesse · 2 years ago
    Thanks, Tom! :)
  • Eric · 2 years ago
    For once I make a reply.
    Good post, and thanks for the tools in #1, I have been looking around for some and was actually seeking opinions on a few. Everyone has their opinions and different practices work better in different environments based on so many factors, so lets all be friends. Again, thank you for the input.
  • Greg · 2 years ago
    Well written and very on target, much better than the other list. I just wanted to add another point regarding normalization and I'll refer back to where you said "However, proper normalization results in a minimization of redundant data. Fundamentally that means a smaller overall footprint at the cost of performance". From a storage point of view that's a good argument but I think there is an even more important reason for normalization and that's management of redundant data. If the same data is stored in more than one place in your database, which one is currently most correct. Just something for new developers to the database world.
    --G
  • troels · 2 years ago
    Thanks for an interesting post. Nothing much for the seasoned MySql user, I suspect, but for an application programmer like myself, who merely uses the database as a place to store data, there were a couple of eye-openers.

    Your site looks crappy in firefox btw.
  • Jesse · 2 years ago
    troels,

    Hmm, what version of Firefox are you using? It looks alright in 2.0.
  • jon · 2 years ago
    #4 is actually an example of vertical partitioning. You're moving "columns" into another table -- think of it as slicing the table vertically.

    An example of horizontal partitioning would be let's say if you have a table full of names, and you made a file group for last names A-L, and a file group for M-Z. Then, commonly, you would put each filegroup on a separate [set of] disk[s], speeding up performance.
  • Jesse · 2 years ago
    Yep, you're right, jon. My mistake. :)
  • Sherif Mansour · 2 years ago
    Thanks for the tips!

    I actually also found a Google Tech Talk Video on MySQL perofmance Tips and found it to be perfect. Check it out: http://blog.sherifmansour.com/?p=72 "Performance Tuning Best Practices for MySQL"
  • Jesse · 2 years ago
    Sherif,

    That presentation is way more comprehensive than my list. Thanks. :)
  • Justin Silverton · 2 years ago
    "Andrew,

    If you don’t have any control over the layout of your database or the database configuration itself then there’s not much you can do. Use the profiling tools I described to isolate the expensive queries and try to rewrite them so that they’re not so expensive. Since you can’t alter indices or change the schema this might prove to be very difficult
    "

    in other words, use the tips at my site that jesse claimed were bad.

    "The rule in any situation where you want to opimize some code is that you first profile it and then find the bottlenecks. Mr. Silverton, however, aims right for the tippy top of the trees. I'd say 60% of database optimization is properly understanding SQL and the basics of databases. "

    If you had bothered to even read (or understand) my article, you would have known that it was titled "10 tips for optimizing mysql queries" IE: Specific things you can do to your queries (not the databse, engine, or software) that can help in optimization.

    Your list is very generic and can be found in almost any book about databases.
  • Jesse · 2 years ago
    Justin,

    I was wondering when you'd find your way over here. I still stand by my statement that your tips suck, or at the very least aren't anything better than content created solely to generate traffic. What's more, you didn't address my critiques. So, here they are again:
    1) You are approaching the problem of performance from the wrong end
    2) Some of the problems you identify have solutions, but you give the wrong one.
    3) Some of your "tips" don't even correspond to actual problems.

    And, I'd add a fourth: since you offer no analysis it's impossible to judge the merits of your tips. Let's take the SQL_SMALL_RESULT "tip" again. What are the downsides to using it? What happens if I use SQL_SMALL_RESULT on a query that accidentally returns a large result set? How does SQL_SMALL_RESULT affect the memory usage of MySQL? Do its effects vary between MySQL engines?

    These are concerns someone seriously interested in database performance would need to know, but you give none of it. My suggestions may be more general but they are, I hope, better supported.

    Cheers.
  • Yong · 2 years ago
    This is really useful!
  • troels · 2 years ago
    > Hmm, what version of Firefox are you using? It looks alright in 2.0.

    Using 2.0.0.3

    The div class=ch_code_container sections get a big grey block at the bottom. It happens because of the height=100%.
  • troels · 2 years ago
    Ah ... I just realized that I was looking at a mirrored site (http://duggmirror.com/programming/10_Tips_for_O...). This probably changes the doctype, which makes the height=100% behave differently.
  • Yuzle! · 2 years ago
    Looks like Digg needs a 'This is a Response to...' ala YouTube!
  • Motorcycle Guy · 2 years ago
    I normally simply use apache's ab for benchmark testing.
  • louis · 2 years ago
    Very nice article.

    I've got a question though.. I'm making a site that uses cron jobs to update every entry in a table every 15mins.

    If my table has thousands's of rows, its gona kill my server every 15mins right?

    Is there a way to spread the load out? Maybe an UPDATE equivalent of INSERT_DELAYED?

    Thanks
    Louis
  • Jesse · 2 years ago
    Louis,

    It depends on the structure of your table and what you're doing every 15 minutes. It also depends on how heavily used the database is. But "thousands" of rows is not a lot. Hundreds of thousands isn't even that bad, presuming your table isn't designed horribly.
  • freelancer · 2 years ago
    Thanks for sharing!
  • Zerd · 2 years ago
    I was trying to rewrite a sub-query into a join query, but it wouldn't run any faster (actually a lot slower). I'm tryign to fetch number of comments for each blog entry. It goes like this:

    SELECT title, COUNT(com.id) comments FROM entries LEFT JOIN com ON entries.id = com.entry GROUP BY entries.id ORDER BY entries.time DESC LIMIT 10

    The subquery just selected COUNT(*) from comments where id = entries.id.
  • Zerd · 2 years ago
    According to Jay Pipes, your example (by splitting columns) is horizontal. Making more tables (splitting and merging) is vertical.
  • Jesse · 2 years ago
    Zerd,

    I'm pretty sure horizontal partitioning is when you split according to rows and vertical is when you split by column. So, e.g., if you have an accounting system and all accounts in a given year are in their own table, that'd be horizontal partitioning. What I've described is vertical partitioning.

    As for your query, I can take a guess. I'd wager that you're using MyISAM tables, in which case count(*) is opimized away. Your indices might also be messed up. That is, if com.entry isn't an index then joining becomes more expensive. I presume entries.id and com.id are primary keys, so they're automatically indexed.

    It might even be faster to do: SELECT title, c.comments FROM entries JOIN (SELECT entry, COUNT(*) as comments FROM comments GROUP BY (comments.entry)) c ON c.entry = entries.id;
  • Wahoo · 2 years ago
    Thank you for sharing!
  • Matt Carpenter · 2 years ago
    Very nice article Jesse. Sound, practical advice well presented and argued. The advanced topics such as benchmarking and profiling are areas I haven't really delved into yet so thanks alot for the pointers. p.s. I don't think Justin Silverton has indexed his `retort` column as his comeback query is taking a long time to run.
  • Mysql freak · 1 year ago
    The missing thing still is how to make sql queries faster
  • ken · 1 year ago
    Jesse:

    Rails doesn't support composite keys for model classes, but posts_tags here is just a join table, and it's fine with composite keys here -- in fact I'm pretty sure it's the default. We do precisely this in several places, and I don't remember needing to do anything special.
  • Jesse · 1 year ago
    ken,

    I think when I wrote this Rails would flip out if even the join table didn't have its own auto incrementing primary key. This might have changed, but it's also possible I was wrong.
  • ken · 1 year ago
    OK, I guess I have to give you that. We do pass :id=>false (which I think has been around basically forever) to the join table, but it's true that some Rails features flip out on :id=>false tables (even with 2.0.2). I don't think I've seen any outflipping for join tables in particular, but I wouldn't bet against it.
  • Jesse · 1 year ago
    ken,

    Yeah. I don't know what version what around when I wrote this article -- but that was 11 months ago. It'd also be unfortunate if Rails flipped out by default (i.e., required :id => false), since, AFAIK, Rails is supposed to be about "convention no configuration."
  • ellisgl · 1 year ago
    I've read a lot of the "MySQL" Optimization guides and yours is the best so far as a single post. Most of them have gone over "Indexing" or totally skewed the guide you end up screwing yourself.

    I use parenthesis when I'm trying to do two compares on a column ie: (`date` >= 'xxxx' AND `date` <= 'yyyy'). Just make me feel comfortable with grouping it like that.

    As for the other things - I think you have given me a lot of stuff to help me in my job that I start in 2 weeks. I leave the current job tomorrow (friday), where even I know my read calls are really quick, but i can see it being at least 2x faster with these tips.
  • Hoteles Mar del Plata · 1 year ago
    Hello, How write sql query for mysql, selet top 10 ?
  • paan · 1 year ago
    I know the post is like a year old. But I think that it is the single best post on mysql optimization.
    Concise and to the point and practical. Good job.
  • Re@PeR · 1 year ago
    These are some fantastic tips, thanks man!
    Just ordered myself a copy of the book High Performance MySQL (Arjen Lentz, Peter Zaitsev, Vadim Tkachenko), can't wait, working on a project that needs massive amounts of data output from MySQL and it seems I'll be using this new knowledge to save the company from having to buy bigger servers.
  • mitcho · 1 year ago
    Hey Jesse-- I just wanted to let you know that I've been doing a fair deal of schema-ing (and scheming) at my new job and I've been using this article like a bible and sharing it with my colleagues. Thanks! ^^
  • Calgary Web Design · 1 year ago
    Thanks for the list. Great perspective.
  • Werner · 1 year ago
    great tips!
  • angsuman · 1 year ago
    Nice tips. I too have written some tips in not so recent past on my blog covering how to handle frequent inserts on a database maximized for reading like a blog using MyISAM.
    BTW: Your tip on eliminating artificial primary key has issues with most ORM frameworks like Hibernate.
  • Jesse · 1 year ago
    Angsuman,

    Cool. I'll check it out.

    As for the ORM stuff, I know, but I don't care that much. My interests skew towards large, denormalized data storage, anyhow. Building the next great Rails app ain't my thing.
  • Zen of Linux · 11 months ago
    Thanks for your list! Also, I was so free to do my own post on some basic and advanced mysql tuning tips!
  • Nate · 10 months ago
    Great article. I'm trying to take my MySql skills to the next level and this is the kind of stuff I want to learn.
  • Henry · 8 months ago
    good post. One way I find make a big difference is using group by rather than distinct when it's possible
  • Sergio · 7 months ago
    I liked a lot your post but wish to as one question, about a problem I need to solve.
    I am facing a situation wher I have to query a database for the countries names, then, when a client select a country, I show the states (regions) of this country and, finally, when he selects one of these regions, I need to show all the cities of this region.
    I ask:
    What should I use:
    one table for the countries with countries names and countries codes;
    one table for the regions with regions names and codes, for each country;
    one table for the cities for that country.
    or, it's best to have only one table with all that data and perform all queries in that table (more than 2,000,000 records)?

    Thanks
  • Nick Yeoman · 6 months ago
    Cool, Thanks!
  • Dimas · 5 months ago
    Great article, very useful insights!
  • brightemeka · 5 months ago
    Thank you a lot for the tips.
    Please how can I introduce optimization algorithm in mysql.
    The will help me to complete my thesis.

    Thanks.

    Sincerely,

    Bright Emeka
  • Jamie Dexter · 4 months ago
    Excellent article and well-written. Glad I found this gem relatively early in my MySQL career!
  • Sky cards · 4 months ago
    cool article, just keep your database strucutre logical and simple, like a real library and everything will fall in to place.

    cheers,

    sky cards
  • joypt · 3 months ago
    I got some of mysql optimiztion tips from here.
    Is it correct?

    I have tried some of them and feel good but few of them were give me good sound
    Please give your feedback
    http://cookphp.blogspot.com/2009/09/top-100-per...
  • santoshb · 1 month ago
    Yo.. Thanks
  • staffing555 · 2 weeks ago
    Interesting thoughts well expressed.
    For more jobs visit http://www.staffingpower.com