Re: 7.4 - basic tuning question

From: Richard Huxton <dev(at)archonet(dot)com>
To: Simon Waters <simonw(at)zynet(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 7.4 - basic tuning question
Date: 2008-10-03 16:08:22
Message-ID: 48E64376.6070802@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Simon Waters wrote:

The best advice is to "upgrade at your earliest convenience" with
performance questions and 7.4 - you're missing a *lot* of improvements.
You say you're planning to anyway, and I'd recommend putting effort into
the upgrade rather than waste effort on tuning a system you're leaving.

> I assume that the histogram_bounds for strings are alphabetical in order, so
> that "DEMOSTART" falls between "DELETE" and "IDEMAIL". Even on a worst case
> of including both these common values, the planner ought to have assumed that
> less than <10% of records were likely covered by the value selected, so it
> seems unlikely to me that not using the index would be a good idea.

Well, the real question is how many blocks need to be read to find those
DEMOSTART rows. At some point around 5-10% of the table it's easier just
to read the whole table than go back and fore between index and table.
The precise point will depend on how much RAM you have, disk speeds etc.

> => SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
> (...lots of time passes...)
> count
> -------
> 1432
> (1 row)

OK, not many. The crucial bit is below though. These are the 10 values
it will hold stats on, and all it knows is that DEMOSTART has less than
57000 entries. OK, it's more complicated than that, but basically there
are values it tracks and everything else. So - it assumes that all other
values have the same chance of occuring.

> => SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count;
>
> count | event
> --------+-----------
[snip]
> 57022 | NEWUSR
> 64907 | PUBREC0
> 65449 | UNPUBLISH
> 92843 | LOGOUT
> 99018 | KILLSESS
> 128900 | UPLOAD
> 134994 | LOGIN
> 137608 | NEWPAGE
> 447556 | PUBREC1
> 489572 | PUBLISH

Which is why it guesses 20436 rows below. If you'd done "SET
enable_seqscan = off" then run the explain again it should have
estimated a cost for the index that was more than 54317.14

> => EXPLAIN SELECT * FROM log WHERE event='DEMOSTART';
> QUERY PLAN
> ------------------------------------------------------------
> Seq Scan on log (cost=0.00..54317.14 rows=20436 width=93)
> Filter: (event = 'DEMOSTART'::text)
> (2 rows)
>
>
> => ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE
> LOG(event);
> ALTER TABLE
> ANALYZE
>
>
> => EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
> QUERY PLAN
> ----------------------------------------------------------------------------
> -------
> Aggregate (cost=5101.43..5101.43 rows=1 width=0)
> -> Index Scan using log_event on log (cost=0.00..5098.15 rows=1310
> width=0)
> Index Cond: (event = 'DEMOSTART'::text)
> (3 rows)

Not bad - now it knows how many rows it will find, and it sees that the
index is cheaper. It's not completely accurate - it uses a statistical
sampling (and of course it's out of date as soon as you update the table).

HTH

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Marc Mamin 2008-10-03 20:55:00 Re: Delete performance again
Previous Message Simon Waters 2008-10-03 13:37:35 7.4 - basic tuning question