Reducing stats collection overhead

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Reducing stats collection overhead
Date: 2007-04-29 04:44:30
Message-ID: 18124.1177821870@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Arjen van der Meijden told me that according to the tweakers.net
benchmark, HEAD is noticeably slower than 8.2.4, and I soon confirmed
here that for small SELECT queries issued as separate transactions,
there's a significant difference. I think much of the difference stems
from the fact that we now have stats_row_level ON by default, and so
every transaction sends a stats message that wasn't there by default
in 8.2. When you're doing a few thousand transactions per second
(not hard for small read-only queries) that adds up.

It seems to me that this could be fixed fairly easily by allowing the
stats to accumulate across multiple small transactions before sending
a message. There's surely not much point in kicking stats out quickly
when the stats collector only reports them to the world every half
second anyway.

The first design that comes to mind is that at transaction end
(pgstat_report_tabstat() time) we send a stats message only if at least
X milliseconds have elapsed since we last sent one, where X is
PGSTAT_STAT_INTERVAL or closely related to it. We also make sure to
flush stats out before process exit. This approach ensures that in a
lots-of-short-transactions scenario, we only need to send one stats
message every X msec, not one per query. The cost is possible delay of
stats reports. I claim that any transaction that makes a really sizable
change in the stats will run longer than X msec and therefore will send
its stats immediately. Cases where a client does a small transaction
after sleeping for awhile (more than X msec) will also send immediately.
You might get a delay in reporting the last few transactions of a burst
of short transactions, but how much does it matter? So I think that
complicating the design with, say, a timeout counter to force out the
stats after a sleep interval is not necessary. Doing so would add a
couple of kernel calls to every client interaction so I'd really rather
avoid that.

Any thoughts, better ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-04-29 05:33:08 Re: pgsql crollable cursor doesn't support one formofpostgresql's cu
Previous Message Neil Conway 2007-04-28 23:58:21 Re: pgsql crollable cursor doesn't support one form ofpostgresql's cu