7.4 - basic tuning question

From: Simon Waters <simonw(at)zynet(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: 7.4 - basic tuning question
Date: 2008-10-03 13:37:35
Message-ID: 200810031437.35440.simonw@zynet.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

we have a log table on one server with 1.9 million records.

One column "event" (type text) in that table is a string that (currently)
takes a small number of distinct values (~43) (hmm that could have been
normalised better).

We noted on querying for events of a specific type, that the queries were
slower than expected. It simply wasn't using the index (btree, default
settings) on this column on this server (the test server, with less records,
was fine).

Using "ALTER TABLE SET STATISTICS" to increase the number of buckets to 50
resolved the issue, we went pretty much straight there on discovering there
are no "HINTS".

However we aren't quite sure why this case was pathological, and my brain
doesn't grok the documentation quite.

I assume that the histogram_bounds for strings are alphabetical in order, so
that "DEMOSTART" falls between "DELETE" and "IDEMAIL". Even on a worst case
of including both these common values, the planner ought to have assumed that
less than <10% of records were likely covered by the value selected, so it
seems unlikely to me that not using the index would be a good idea.

What am I missing? (and yes there is a plan to upgrade!).

=> SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...lots of time passes...)
count
-------
1432
(1 row)

=> SELECT COUNT(*), event FROM log GROUP BY event ORDER BY count;

count | event
--------+-----------
6 | DNRFAIL
14 | ADMDNR
14 | UPGRADE
18 | FOCRENEW
21 | AUTOCN
25 | ADMCC
27 | TEMPIN
31 | DNRCANCEL
43 | EXPIRED
128 | DIRECTBUY
130 | CANCEL
130 | CANCELQ
154 | FOCBUY
173 | EXPCCWARN
179 | OFFER
209 | DNROK
214 | TEMPRE
356 | CCWARN
429 | ADMLOGIN
719 | SUBSCRIBE
787 | CCSUCCESS
988 | CCFAILURE
1217 | TEMPNEW
1298 | PAYPAL
1431 | DEMOSTART
1776 | CCREQUEST
2474 | ACCTUPD
15169 | SYSMAINT
42251 | IDEMAIL
46964 | DELETE
50764 | RELOGIN
57022 | NEWUSR
64907 | PUBREC0
65449 | UNPUBLISH
92843 | LOGOUT
99018 | KILLSESS
128900 | UPLOAD
134994 | LOGIN
137608 | NEWPAGE
447556 | PUBREC1
489572 | PUBLISH

=> EXPLAIN SELECT * FROM log WHERE event='DEMOSTART';
QUERY PLAN
------------------------------------------------------------
Seq Scan on log (cost=0.00..54317.14 rows=20436 width=93)
Filter: (event = 'DEMOSTART'::text)
(2 rows)

=> ALTER TABLE log ALTER COLUMN events SET STATISTICS 50; ANALYSE
LOG(event);
ALTER TABLE
ANALYZE

=> EXPLAIN SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
QUERY PLAN
----------------------------------------------------------------------------
-------
Aggregate (cost=5101.43..5101.43 rows=1 width=0)
-> Index Scan using log_event on log (cost=0.00..5098.15 rows=1310
width=0)
Index Cond: (event = 'DEMOSTART'::text)
(3 rows)

=> SELECT COUNT(*) FROM log WHERE event='DEMOSTART';
(...almost no time passes...)
count
-------
1432
(1 row)

BEFORE
pajax=> select * from pg_stats where tablename = 'log' and attname='event';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs |
histogram_bounds | correlation
------------+-----------+---------+-----------+-----------+------------+--------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------------------+-------------
public | log | event | 0 | 10 | 25 |
{PUBLISH,PUBREC1,NEWPAGE,UPLOAD,LOGIN,KILLSESS,LOGOUT} |
{0.257333,0.248333,0.072,0.0696667,0.0613333,0.0543333,0.0506667} |
{ACCTUPD,DELETE,IDEMAIL,NEWUSR,NEWUSR,PUBREC0,PUBREC0,RELOGIN,SYSMAINT,UNPUBLISH,UNPUBLISH} |
0.120881
(1 row)

AFTER
pajax=> select * from pg_stats where tablename='log' and attname='event';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs |
histogram_bounds |
correlation
------------+-----------+---------+-----------+-----------+------------+--------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
public | log | event | 0 | 10 | 32 |
{PUBLISH,PUBREC1,NEWPAGE,LOGIN,UPLOAD,KILLSESS,LOGOUT,PUBREC0,UNPUBLISH,NEWUSR,RELOGIN,DELETE,IDEMAIL} |
{0.249067,0.248533,0.0761333,0.0719333,0.0685333,0.0526,0.045,0.0368,0.0348667,0.029,0.0255333,0.0254667,0.0238667} |
{ACCTUPD,ACCTUPD,ACCTUPD,ADMLOGIN,CCREQUEST,CCSUCCESS,DEMOSTART,FOCBUY,PAYPAL,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,SYSMAINT,TEMPNEW,TEMPRE} |
0.106671
(1 row)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2008-10-03 16:08:22 Re: 7.4 - basic tuning question
Previous Message Tom Lane 2008-10-03 12:19:51 Re: Slow Inserts on large tables