Skip site navigation (1) Skip section navigation (2)

Re: Stats collector performance improvement

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Fuhr <mike(at)fuhr(dot)org>, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, Carlos Benkendorf <carlosbenkendorf(at)yahoo(dot)com(dot)br>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Stats collector performance improvement
Date: 2006-01-05 00:39:42
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-patchespgsql-performance
Bruce Momjian wrote:
> I did some research on this because the numbers Tom quotes indicate there
> is something wrong in the way we process stats_command_string
> statistics.
> This sends 10,000 "SELECT 1" queries to the backend, and reports the
> execution time.  I found that without stats_command_string defined, it
> ran in 3.5 seconds.  With stats_command_string defined, it took 5.5
> seconds, meaning the command string is causing a 57% slowdown.  That is
> way too much considering that the SELECT 1 has to be send from psql to
> the backend, parsed, optimized, and executed, and the result returned to
> the psql, while stats_command_string only has to send a string to a
> backend collector.  There is _no_ way that collector should take 57% of
> the time it takes to run the actual query.

I have updated information on this performance issue.  It seems it is
the blocking activity of recv() that is slowing down the buffer process
and hence the backends.  Basically, I found if I use select() or recv()
to block until data arrives, I see the huge performance loss reported
above.  If I loop over the recv() call in non-blocking mode, I see
almost no performance hit from stats_command_string (no backend
slowdown), but of course that consumes all the CPU (bad).  What I found
worked perfectly was to do a non-blocking recv(), and if no data was
returned, change the socket to blocking mode and loop back over the
recv().  This allowed for no performance loss, and prevented infinite
looping over the recv() call.

My theory is that the kernel blocking logic of select() or recv() is
somehow locking up the socket for a small amount of time, therefore
slowing down the backend.  With the on/off blocking, the packets arrive
in groups, we get a few packets then block when nothing is available. 

The test program:

	export TMPFILE
	if [ ! -f $TMPFILE ]
	then	i=0
		while [ $i -lt 10000 ]
			i=`expr $i + 1`
			echo "SELECT 1;"
		done > $TMPFILE
	time psql test < $TMPFILE >/dev/null

is basically sending 30k packets of roughly 26 bytes each, or roughly
800k in 3.5 seconds, meaning there is a packet every 0.0001 seconds.  I
wouldn't have thought that was too much volume for a dual Xeon BSD
machine, but it seems it might be.  Tom seeing 44% slowdown from pgbench
means Linux might have an issue too.

Two patches are attached.  The first patch shows the use of the on/off
blocking method to have almost zero overhead for reading from the
socket.  (The packets are discarded.)  The second patch removes the
buffer process entirely and uses the on/off buffering to process the
incoming packets.  I tried running two test scripts simultaneously and
saw almost no packet loss.  Also keep in mind we are writing the stat
file twice a second, which might need to be pushed into a separate

  Bruce Momjian                        |
  pgman(at)candle(dot)pha(dot)pa(dot)us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

In response to

pgsql-performance by date

Next:From: Markus SchaberDate: 2006-01-05 11:29:59
Subject: Re: Invulnerable VACUUM process thrashing everything
Previous:From: Steve EckmannDate: 2006-01-05 00:19:08
Subject: Re: improving write performance for logging application

pgsql-hackers by date

Next:From: Simon RiggsDate: 2006-01-05 00:57:01
Subject: Re: Improving N-Distinct estimation by ANALYZE
Previous:From: Simon RiggsDate: 2006-01-05 00:24:25
Subject: Re: Improving N-Distinct estimation by ANALYZE

pgsql-patches by date

Next:From: Bruce MomjianDate: 2006-01-05 01:51:20
Subject: Remove QNX port
Previous:From: Josh BerkusDate: 2006-01-04 19:41:36
Subject: Re: Inconsistent syntax in GRANT

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group