Hashtable entry recycling algorithm in pg_stat_statements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Hashtable entry recycling algorithm in pg_stat_statements
Date: 2009-01-03 00:22:40
Message-ID: 14204.1230942160@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The pending contrib/pg_stat_statements patch has an interesting method
for dealing with the limited size of its shared-memory hash table
(which has got one entry per unique statement text, userid, and database
id, so it's not exactly likely to be small). When the hash table fills
up, it enumerates all the hash entries, sorts them by "usage" order,
decrements the usage counts, and then removes the ones with least usage
values. It does all this while holding exclusive lock on the hash
table. This is going to mean that every so often, your entire database
just freezes up for N milliseconds while this is going on --- no query
will be able to get through ExecutorEnd until that cleanup work is done.
I don't think this is a property that will go over well with the sort of
DBA who would have a use for this module. Unpredictable response times
are exactly what he's hoping to avoid.

A couple of other possibilities that seem a bit saner:

1. Use a self-organizing list: any time an entry is referenced,
move it to front, and when you need a new entry take the oldest
one off the back. I don't see a way to do that without a global
lock that protects the list links, but there could be a spinlock
that's held only long enough to manipulate the list links.

2. Use a clock sweep algorithm similar to bufmgr's.

Either of these trades off accuracy of deciding which existing cache
entries are "least interesting" in order to reduce the maintenance
overhead --- but it doesn't appear to me that the code implements usage
counts in a way that would justify treating them as sacrosanct
indicators of relative usefulness anyhow.

The first option seems attractively simple and predictable in
performance --- all operations are O(1).

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-01-03 00:51:39 Re: posix_fadvise v22
Previous Message Alvaro Herrera 2009-01-03 00:14:39 Re: Auto-updated fields