Community Page
- 20bits.com Jump to website »
-
Subscribe -
Community
-
Top Commenters
-
Popular Threads
-
Recent Comments
- The people go outside?
- to bold write just *text* wullah
- forgot (^^^) the shark
- Hi Jesse, I just came across your site while I was searching for more details on the influence function. You have mentioned that "calculating the influence function exactly is NP-hard" -...
- The Erlang documentation covers its approach to in-place code upgrades here: http://erlang.org/doc/design_principles/part_frame.html . In particular, see the sections "Releases" and...
Jump to original thread »
10 Tips for Optimizing MySQL Queries (That don’t suck) | 20bits
Started by Jesse Farmer · 7 months ago
2 years ago
I just scanned your post (it's past 2AM and I have an early flight tomorrow), but it looks like a _huge_ improvement over the Jaslabs list. In fact, it looks like the kind of post I would have written had I had more time! Great job.
2 years ago
2 years ago
2 years ago
2 years ago
What if you don't control the schema? I guess then you don't have much.
2 years ago
2 years ago
In the example you gave I agree, keep the auto_increment out of there, but in other cases where there's natural key you may actually be hurting performance.
2 years ago
Unfortunately that problem is hard. I personally wouldn't trust a database to reliably translate subqueries into joins where possible. I also don't think joins are frequently more obscure than subselects, especially if you're (1) familiar with SQL idioms and (2) leave comments about what data the query is fetching.
2 years ago
That is true. Rails, for example, doesn't support composite primary keys out of the box. Unfortunately that is a problem since often composite primary keys are really the best solution. I know there's a mixin for Rails which adds support for composite keys, although I don't know how well it performs.
2 years ago
If you don't have any control over the layout of your database or the database configuration itself then there's not much you can do. Use the profiling tools I described to isolate the expensive queries and try to rewrite them so that they're not so expensive. Since you can't alter indices or change the schema this might prove to be very difficult.
2 years ago
I think he was talking about using auto_increment versus incrementing via some other mechanism. Drupal, which I've written about elsewhere, does just this and it's really annoying — it has its own sequence table. This makes multiple inserts difficult if not impossible. But if you have a natural primary key then adding an auto incrementing artificial key doesn't get you anything. IOW, if you have an artificial key, it's best to make it an auto_increment integer column, in MySQL at least.
2 years ago
2 years ago
Well, it depends on what data you're actually trying to fetch. It was meant as an illustration of the subquery vs. join issue, though. I think it suffices for that. :)
2 years ago
2 years ago
As for the sub-queries being changed to joins, the MySQL query optimizer simply does not optimize these well, Oracle or MSSQL on the other hand convert those sub-queries to joins. The point is to realize the strengths and weaknesses of your database, and develop accordingly. Whether or not it should be done by the optimizer is a moot point when you're writing code. Since the optimizer isn't going to help you out until a later version, you should account for this if you're going to be using the DB. If you don't, the result will be extremely inefficient queries.
2 years ago
Good to know. The environments in which I've used Oracle and MSSQL have never approached the level of activity of the environments in which I've used MySQL (LAMP stack applications generating 30M+ pageviews a month, etc.).
As for Jay Pipes, I've never heard of the guy. These all come from my own experience, particularly making Drupal perform well, and reading Peter Zeitsev's great weblog. Drupal basically rides right on top of the database so I've applied all these things many times over scaling Drupal applications. Have any good Pipes-related links?
2 years ago
A local university was lucky enough to get him to give a nice (and free) presentation on optimizing MySQL, and I must say, it was the most informative 4 hour session I've been to, regardless of price. He's got a great understanding on how the database works, and what we as developers can and conversely shouldn't do to get the best performance out of it.
Looks like they just did a few of these in the Midwest, and are now done... but if you get a chance to go to one of these in the future I'd highly recommend it. I've been to MS and Oracle's events like these, and while those are marketing driven, this was highly valuable.
2 years ago
2 years ago
Good post, and thanks for the tools in #1, I have been looking around for some and was actually seeking opinions on a few. Everyone has their opinions and different practices work better in different environments based on so many factors, so lets all be friends. Again, thank you for the input.
2 years ago
--G
2 years ago
Your site looks crappy in firefox btw.
2 years ago
Hmm, what version of Firefox are you using? It looks alright in 2.0.
2 years ago
An example of horizontal partitioning would be let's say if you have a table full of names, and you made a file group for last names A-L, and a file group for M-Z. Then, commonly, you would put each filegroup on a separate [set of] disk[s], speeding up performance.
2 years ago
2 years ago
I actually also found a Google Tech Talk Video on MySQL perofmance Tips and found it to be perfect. Check it out: http://blog.sherifmansour.com/?p=72 "Performance Tuning Best Practices for MySQL"
2 years ago
That presentation is way more comprehensive than my list. Thanks. :)
2 years ago
If you don’t have any control over the layout of your database or the database configuration itself then there’s not much you can do. Use the profiling tools I described to isolate the expensive queries and try to rewrite them so that they’re not so expensive. Since you can’t alter indices or change the schema this might prove to be very difficult
"
in other words, use the tips at my site that jesse claimed were bad.
"The rule in any situation where you want to opimize some code is that you first profile it and then find the bottlenecks. Mr. Silverton, however, aims right for the tippy top of the trees. I'd say 60% of database optimization is properly understanding SQL and the basics of databases. "
If you had bothered to even read (or understand) my article, you would have known that it was titled "10 tips for optimizing mysql queries" IE: Specific things you can do to your queries (not the databse, engine, or software) that can help in optimization.
Your list is very generic and can be found in almost any book about databases.
2 years ago
I was wondering when you'd find your way over here. I still stand by my statement that your tips suck, or at the very least aren't anything better than content created solely to generate traffic. What's more, you didn't address my critiques. So, here they are again:
1) You are approaching the problem of performance from the wrong end
2) Some of the problems you identify have solutions, but you give the wrong one.
3) Some of your "tips" don't even correspond to actual problems.
And, I'd add a fourth: since you offer no analysis it's impossible to judge the merits of your tips. Let's take the SQL_SMALL_RESULT "tip" again. What are the downsides to using it? What happens if I use SQL_SMALL_RESULT on a query that accidentally returns a large result set? How does SQL_SMALL_RESULT affect the memory usage of MySQL? Do its effects vary between MySQL engines?
These are concerns someone seriously interested in database performance would need to know, but you give none of it. My suggestions may be more general but they are, I hope, better supported.
Cheers.
2 years ago
2 years ago
Using 2.0.0.3
The div class=ch_code_container sections get a big grey block at the bottom. It happens because of the height=100%.
2 years ago
2 years ago
2 years ago
2 years ago
I've got a question though.. I'm making a site that uses cron jobs to update every entry in a table every 15mins.
If my table has thousands's of rows, its gona kill my server every 15mins right?
Is there a way to spread the load out? Maybe an UPDATE equivalent of INSERT_DELAYED?
Thanks
Louis
2 years ago
It depends on the structure of your table and what you're doing every 15 minutes. It also depends on how heavily used the database is. But "thousands" of rows is not a lot. Hundreds of thousands isn't even that bad, presuming your table isn't designed horribly.
2 years ago
2 years ago
SELECT title, COUNT(com.id) comments FROM entries LEFT JOIN com ON entries.id = com.entry GROUP BY entries.id ORDER BY entries.time DESC LIMIT 10
The subquery just selected COUNT(*) from comments where id = entries.id.
2 years ago
2 years ago
I'm pretty sure horizontal partitioning is when you split according to rows and vertical is when you split by column. So, e.g., if you have an accounting system and all accounts in a given year are in their own table, that'd be horizontal partitioning. What I've described is vertical partitioning.
As for your query, I can take a guess. I'd wager that you're using MyISAM tables, in which case count(*) is opimized away. Your indices might also be messed up. That is, if com.entry isn't an index then joining becomes more expensive. I presume entries.id and com.id are primary keys, so they're automatically indexed.
It might even be faster to do: SELECT title, c.comments FROM entries JOIN (SELECT entry, COUNT(*) as comments FROM comments GROUP BY (comments.entry)) c ON c.entry = entries.id;
1 year ago
1 year ago
1 year ago
1 year ago
Rails doesn't support composite keys for model classes, but posts_tags here is just a join table, and it's fine with composite keys here -- in fact I'm pretty sure it's the default. We do precisely this in several places, and I don't remember needing to do anything special.
1 year ago
I think when I wrote this Rails would flip out if even the join table didn't have its own auto incrementing primary key. This might have changed, but it's also possible I was wrong.
1 year ago
1 year ago
Yeah. I don't know what version what around when I wrote this article -- but that was 11 months ago. It'd also be unfortunate if Rails flipped out by default (i.e., required :id => false), since, AFAIK, Rails is supposed to be about "convention no configuration."
1 year ago
I use parenthesis when I'm trying to do two compares on a column ie: (`date` >= 'xxxx' AND `date` <= 'yyyy'). Just make me feel comfortable with grouping it like that.
As for the other things - I think you have given me a lot of stuff to help me in my job that I start in 2 weeks. I leave the current job tomorrow (friday), where even I know my read calls are really quick, but i can see it being at least 2x faster with these tips.
1 year ago
1 year ago
Concise and to the point and practical. Good job.
11 months ago
Just ordered myself a copy of the book High Performance MySQL (Arjen Lentz, Peter Zaitsev, Vadim Tkachenko), can't wait, working on a project that needs massive amounts of data output from MySQL and it seems I'll be using this new knowledge to save the company from having to buy bigger servers.
10 months ago
10 months ago
10 months ago
10 months ago
BTW: Your tip on eliminating artificial primary key has issues with most ORM frameworks like Hibernate.
8 months ago
Cool. I'll check it out.
As for the ORM stuff, I know, but I don't care that much. My interests skew towards large, denormalized data storage, anyhow. Building the next great Rails app ain't my thing.
6 months ago
5 months ago
2 months ago
2 months ago
I am facing a situation wher I have to query a database for the countries names, then, when a client select a country, I show the states (regions) of this country and, finally, when he selects one of these regions, I need to show all the cities of this region.
I ask:
What should I use:
one table for the countries with countries names and countries codes;
one table for the regions with regions names and codes, for each country;
one table for the cities for that country.
or, it's best to have only one table with all that data and perform all queries in that table (more than 2,000,000 records)?
Thanks
2 weeks ago