Test request for Stats collector performance improvement

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Fuhr <mike(at)fuhr(dot)org>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, Carlos Benkendorf <carlosbenkendorf(at)yahoo(dot)com(dot)br>
Subject: Test request for Stats collector performance improvement
Date: 2006-06-15 04:05:38
Message-ID: 200606150405.k5F45cH11445@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches pgsql-performance


Would some people please run the attached test procedure and report back
the results? I basically need to know the patch is an improvement on
more platforms than just my own. Thanks

---------------------------------------------------------------------------

Run this script and record the time reported:

ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.script

Modify postgresql.conf:

stats_command_string = on

and reload the server. Do "SELECT * FROM pg_stat_activity;" to verify
the command string is enabled. You should see your query in the
"current query" column.

Rerun the stat.script again and record the time.

Apply this patch to CVS HEAD:

ftp://candle.pha.pa.us/pub/postgresql/mypatches/stat.nobuffer

Run the stat.script again and record the time.

Report via email your three times and your platform.

If the patch worked, the first and third times will be similar, and
the second time will be high.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Michael Fuhr <mike(at)fuhr(dot)org> writes:
> > > Further tests show that for this application
> > > the killer is stats_command_string, not stats_block_level or
> > > stats_row_level.
> >
> > I tried it with pgbench -c 10, and got these results:
> > 41% reduction in TPS rate for stats_command_string
> > 9% reduction in TPS rate for stats_block/row_level (any combination)
> >
> > strace'ing a backend confirms my belief that stats_block/row_level send
> > just one stats message per transaction (at least for the relatively
> > small number of tables touched per transaction by pgbench). However
> > stats_command_string sends 14(!) --- there are seven commands per
> > pgbench transaction and each results in sending a <command> message and
> > later an <IDLE> message.
> >
> > Given the rather lackadaisical way in which the stats collector makes
> > the data available, it seems like the backends are being much too
> > enthusiastic about posting their stats_command_string status
> > immediately. Might be worth thinking about how to cut back the
> > overhead by suppressing some of these messages.
>
> I did some research on this because the numbers Tom quotes indicate there
> is something wrong in the way we process stats_command_string
> statistics.
>
> I made a small test script:
>
> if [ ! -f /tmp/pgstat.sql ]
> then i=0
> while [ $i -lt 10000 ]
> do
> i=`expr $i + 1`
> echo "SELECT 1;"
> done > /tmp/pgstat.sql
> fi
>
> time psql test </tmp/pgstat.sql >/dev/null
>
> This sends 10,000 "SELECT 1" queries to the backend, and reports the
> execution time. I found that without stats_command_string defined, it
> ran in 3.5 seconds. With stats_command_string defined, it took 5.5
> seconds, meaning the command string is causing a 57% slowdown. That is
> way too much considering that the SELECT 1 has to be send from psql to
> the backend, parsed, optimized, and executed, and the result returned to
> the psql, while stats_command_string only has to send a string to a
> backend collector. There is _no_ way that collector should take 57% of
> the time it takes to run the actual query.
>
> With the test program, I tried various options. The basic code we have
> sends a UDP packet to a statistics buffer process, which recv()'s the
> packet, puts it into a memory queue buffer, and writes it to a pipe()
> that is read by the statistics collector process which processes the
> packet.
>
> I tried various ways of speeding up the buffer and collector processes.
> I found if I put a pg_usleep(100) in the buffer process the backend
> speed was good, but packets were lost. What I found worked well was to
> do multiple recv() calls in a loop. The previous code did a select(),
> then perhaps a recv() and pipe write() based on the results of the
> select(). This caused many small packets to be written to the pipe and
> the pipe write overhead seems fairly large. The best fix I found was to
> loop over the recv() call at most 25 times, collecting a group of
> packets that can then be sent to the collector in one pipe write. The
> recv() socket is non-blocking, so a zero return indicates there are no
> more packets available. Patch attached.
>
> This change reduced the stats_command_string time from 5.5 to 3.9, which
> is closer to the 3.5 seconds with stats_command_string off.
>
> A second improvement I discovered is that the statistics collector is
> calling gettimeofday() for every packet received, so it can determine
> the timeout for the select() call to write the flat file. I removed
> that behavior and instead used setitimer() to issue a SIGINT every
> 500ms, which was the original behavior. This eliminates the
> gettimeofday() call and makes the code cleaner. Second patch attached.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qingqing Zhou 2006-06-15 05:02:20 Are we still interested in the master-slave scan patch
Previous Message Tom Lane 2006-06-15 03:04:52 Re: Alternative variable length structure

Browse pgsql-patches by date

  From Date Subject
Next Message Pavel Stehule 2006-06-15 04:45:27 Re: SQL/XML publishing function experimental patch II
Previous Message Bruce Momjian 2006-06-14 22:52:56 Re: Interval aggregate regression failure (expected seems

Browse pgsql-performance by date

  From Date Subject
Next Message Zoltan Boszormenyi 2006-06-15 04:31:02 Re: Precomputed constants?
Previous Message Greg Stark 2006-06-15 03:33:53 Re: Postgres fsync off (not needed) with NetApp