Tuesday, July 7, 2009

Just do the math and keep statistics

Today I came across a post from the MySQL performance blog titled "Just do the math!" which talks about something that should be very basic for every DBA/developer. Go and read it, as it's a helpful article.
The ability to estimate query performance based on metrics and expected data loads is very important and the example given shows how the query and reporting will have a hard time scaling. Other than trying to redo your hardware, optimize queries (however, at the end of the day you can not optimize away bulk)

So how else can you speed this example up?
Well first off all, in the example the data is being imported from Apache logs, my general recommendation is to create smaller workloads. Instead of importing the logs daily, import them hourly (or every 5 minutes). You should also do the same for reports and run them automatically and have them record their statistics in a table, which can then be queried for the interactive batch jobs.

This will still require the same kind of machine horsepower as it does not make processing disappear, but it does include some optimizations; Sorts, merges, joins can be done in memory on smaller chunks of data and thus lower the overall IO requirement.

No comments: