Re: Stats collector performance improvement

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Fuhr <mike(at)fuhr(dot)org>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, Carlos Benkendorf <carlosbenkendorf(at)yahoo(dot)com(dot)br>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Stats collector performance improvement
Date: 2006-01-05 00:39:42
Message-ID: 200601050039.k050dgv22567@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches pgsql-performance

Bruce Momjian wrote:
> I did some research on this because the numbers Tom quotes indicate there
> is something wrong in the way we process stats_command_string
> statistics.
>
...
> This sends 10,000 "SELECT 1" queries to the backend, and reports the
> execution time. I found that without stats_command_string defined, it
> ran in 3.5 seconds. With stats_command_string defined, it took 5.5
> seconds, meaning the command string is causing a 57% slowdown. That is
> way too much considering that the SELECT 1 has to be send from psql to
> the backend, parsed, optimized, and executed, and the result returned to
> the psql, while stats_command_string only has to send a string to a
> backend collector. There is _no_ way that collector should take 57% of
> the time it takes to run the actual query.

I have updated information on this performance issue. It seems it is
the blocking activity of recv() that is slowing down the buffer process
and hence the backends. Basically, I found if I use select() or recv()
to block until data arrives, I see the huge performance loss reported
above. If I loop over the recv() call in non-blocking mode, I see
almost no performance hit from stats_command_string (no backend
slowdown), but of course that consumes all the CPU (bad). What I found
worked perfectly was to do a non-blocking recv(), and if no data was
returned, change the socket to blocking mode and loop back over the
recv(). This allowed for no performance loss, and prevented infinite
looping over the recv() call.

My theory is that the kernel blocking logic of select() or recv() is
somehow locking up the socket for a small amount of time, therefore
slowing down the backend. With the on/off blocking, the packets arrive
in groups, we get a few packets then block when nothing is available.

The test program:

TMPFILE=/tmp/pgstat.sql
export TMPFILE

if [ ! -f $TMPFILE ]
then i=0
while [ $i -lt 10000 ]
do
i=`expr $i + 1`
echo "SELECT 1;"
done > $TMPFILE
fi

time psql test < $TMPFILE >/dev/null

is basically sending 30k packets of roughly 26 bytes each, or roughly
800k in 3.5 seconds, meaning there is a packet every 0.0001 seconds. I
wouldn't have thought that was too much volume for a dual Xeon BSD
machine, but it seems it might be. Tom seeing 44% slowdown from pgbench
means Linux might have an issue too.

Two patches are attached. The first patch shows the use of the on/off
blocking method to have almost zero overhead for reading from the
socket. (The packets are discarded.) The second patch removes the
buffer process entirely and uses the on/off buffering to process the
incoming packets. I tried running two test scripts simultaneously and
saw almost no packet loss. Also keep in mind we are writing the stat
file twice a second, which might need to be pushed into a separate
process.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 1.9 KB
unknown_filename text/plain 16.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-01-05 00:57:01 Re: Improving N-Distinct estimation by ANALYZE
Previous Message Simon Riggs 2006-01-05 00:24:25 Re: Improving N-Distinct estimation by ANALYZE

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2006-01-05 01:51:20 Remove QNX port
Previous Message Josh Berkus 2006-01-04 19:41:36 Re: Inconsistent syntax in GRANT

Browse pgsql-performance by date

  From Date Subject
Next Message Markus Schaber 2006-01-05 11:29:59 Re: Invulnerable VACUUM process thrashing everything
Previous Message Steve Eckmann 2006-01-05 00:19:08 Re: improving write performance for logging application