[Cialug] mysql performance notes

Matthew Nuzum newz at bearfruit.org
Thu Aug 9 16:45:25 CDT 2007


My colleague, Robert Collins, was recently at the Euro Python
conference. He listed in on the MySQL optimization talk (he's working
on Bzr optimazation) and sent these notes along. Thought others might
find them interesting.

By "dense" I'm sure he means "packed full of information." :-)

> MySQL optimisation talk. Very interesting talk from David Axmark - I
> attended largely because of the performance drive in bzr - alternative
> ideas for performance are always useful. His talk was very dense, I'll
> try to get the key points here.
>  - its all about minimising disk IO
>  - ORM hints to db is an interesting thing because its harder to control
> than raw SQL.
>  - indexes - B-Tree or Hash based
>    - hash based indexes are really fast for simple 'get me this exact
> key' queries.
>  - Avoiding seeks *really* matters - and it matters more with newer
> disks.
>  - there is a built in heuristic - accessing more than 30% of THING do a
> scan not seek.
>  - multiple columns in an index is often better than multiple indexes of
> each column because less data is needed to query on those columns.
>  - aim for selectivity of the index - order fields in the index by
> selectivity. (if you have three fields indexed, and one is boolean, one
> is an enum with 10 values, and one is a wide range of freeform strings,
> then put the strings first, the enum second and the boolean last as
> queries on the strings will limit the results the most)
>  - index size matters - consider partial indexes on the first X bytes of
> each field. I.e. rather than indexing 3x 200 byte strings, index the
> first 20 bytes of each - 3x 20 byte strings - this will let you process
> more index entries with each disk io, though mysql will sometimes need
> to go to disk to differentiate when you get multiple hits on an index
> row.
>  - splitting data out:
>   - smaller records allows less IO to get results back so shrink the
> records e.g.
>     - horizontal splitting of tables to separate hot data and rarely
>       used data
>  - use joins
>  - do not use correlated subqueries
>   - e.g. select ... , (select ....)
>   because each subquery is executed separately
>  - use derived tables
>   - e.g. select ... INNER JOIN (select ...)
>   this generates the subquery once and then extracts data from it for
> each row in the output
>  - think 'sets' not 'iterators' or sequential programming
>  - use indexed fields directly rather than functions on them.. modify
> the other side of expressions instead. e.g. do modified > NOW() - 10,
> rather than modified + 10 > NOW()
>  - avoid NOT IN and <> - proving negatives is expensive.
>

-- 
Matthew Nuzum
newz2000 on freenode


More information about the Cialug mailing list