Rethinking stats communication mechanisms

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Rethinking stats communication mechanisms
Date: 2006-06-17 21:12:22
Message-ID: 23144.1150578742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In view of my oprofile results
http://archives.postgresql.org/pgsql-hackers/2006-06/msg00859.php
I'm thinking we need some major surgery on the way that the stats
collection mechanism works.

It strikes me that we are using a single communication mechanism to
handle what are really two distinct kinds of data:

* Current-state information, eg, what backends are alive and what
commands they are currently working on. Ideally we'd like this type of
info to be 100% up-to-date. But once a particular bit of information
(eg a command string) is obsolete, it's not of interest anymore.

* Event counts. These accumulate and so past information is still
important. On the other hand, it's not so critical that the info be
completely up-to-date --- the central counters can lag behind a bit,
so long as events eventually get counted.

I believe the stats code was designed with the second case in mind,
but we've abused it to handle the first case, and that's why we've
now got performance problems.

If we are willing to assume that the current-state information is of
fixed maximum size, we could store it in shared memory. (This
suggestion already came up in the recent thread about ps_status,
and I think it's been mentioned before too --- but my point here is
that we have to separate this case from the event-counting case.)
The only real restriction we'd be making is that we can only show the
first N characters of current command string, but we're already
accepting that limitation in the existing stats code. (And we could
make N whatever we wanted, without worrying about UDP datagram limits.)
I'm envisioning either adding fields to the PGPROC array, or perhaps
better using a separate array with an entry for each backend ID.
Backends would write their status info into this array and any
interested backend could read it out again. The stats collector
process needn't be involved at all AFAICS. This eliminates any
process-dispatch overhead to report command start or command
termination. Instead we'd have some locking overhead, but contention
ought to be low enough that that's not a serious problem. I'm assuming
a separate lock for each array entry so that backends don't contend with
each other to update their entries; contention occurs only when someone
is actively reading the information. We should probably use LWLocks not
spinlocks because the time taken to copy a long command string into the
shared area would be longer than we ought to hold a spinlock (but this
seems a bit debatable given the expected low contention ... any
thoughts?)

The existing stats collection mechanism seems OK for event counts,
although I'd propose two changes: one, get rid of the separate
buffer process, and two, find a way to emit event reports in a
time-driven way rather than once per transaction commit. I'm a bit
vague about how to do the latter at the moment.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-06-17 21:46:40 Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Previous Message Tom Lane 2006-06-17 20:37:45 Re: MultiXacts & WAL