Re: Access statistics

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, PostgreSQL HACKERS <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Access statistics
Date: 2001-06-01 13:31:05
Message-ID: 200106011331.f51DV5P02078@jupiter.us.greatbridge.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Jan Wieck writes:
>
> > In the next couple of hours (at least tomorrow) I would be
> > ready to commit the backend changes for table-/index-access
> > statistics and current backend activity views.
> >
> > Should I apply the patches or provide a separate patch for
> > review first?
>
> Maybe you could describe what it's going to do and how it's going to work.

Real programmers don't comment - if it was hard to write it
should be hard to read :-)

So outing myself not beeing a *real programmer*, this is what
I have so far:

* On startup the postmaster creates an INET domain UDP socket
and bind(2)'s it to localhost:0, meaning the kernel will
assign a yet unused, unprivileged port that could be seen
with getsockaddr(2).

It then starts two background processes of which one is
simply a wraparound buffer doing recvfrom(2) on the socket,
checking that the source address of the received packets is
the sockets own address (!) and forwarding approved ones
over a pipe to the second one, discribed later.

* Backends call some collector functions at various places
now (these will finally be macros), that count up table
scans, tuples returned by scans, buffer fetches/hits and
the like. At the beginning of a statement the backends send
a message telling the first couple of hundred bytes of the
querystring and after the statement is done (just before
getting ready for the next command) they send the collected
access numbers.

Tables, indexes etc. in these statistics are identified by
OID, so the data doesn't tell much so far.

* The second background process fired by the postmaster
collects these numbers into hashtables and as long as it
receives messages, it'll write out a summary file every 500
or so milliseconds, telling a snapshot of current stats.
Using select(2) with timeouts ensures that a complete idle
DB instance not to waste a single CPU cycle or IO to write
these snapshots.

On startup it tries to read the last snapshot file in, so
the collected statistics survive a postmaster restart.

Vacuum reads the file too and sends bulk delete messages
for objects that are gone. So the stats don't grow
infinitely.

* A bunch of new builtin functions gain access to the
snapshot file. At first call of one of these functions
during a transaction, the backend will read the current
file and return the numbers from in memory then.

Based on these functions a couple of views can tell these
collected stats. Information from the databases system
catalog is of course required to identify the objects in
the stats, but I think those informations should only be
visible to someone who identified herself as a valid DB
user anyway.

The visibility of querystrings (this info is available
cross DB) is restricted to DB superusers.

There has been discussion already about using an INET vs.
UNIX UDP socket for the communication. At least for Linux I
found INET to be the most effective way of communication. And
for security concerns: If someone else than root can really
send packets to that socket that show up with a source
address of 127.0.0.1:n, where n is a portnumber actually
occupied by your own socket, be sure you'll have more severe
problems than modified access statistics.

The views should be considered examples. The final naming and
layout is subject for discussion.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-06-01 13:49:25 Re: ERROR: cache lookup for proc 43030134 failed
Previous Message Michael Samuel 2001-06-01 13:11:13 Re: Re: Support for %TYPE in CREATE FUNCTION