Ragged latency log data in multi-threaded pgbench

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Ragged latency log data in multi-threaded pgbench
Date: 2010-03-15 15:41:16
Message-ID: 4B9E551C.5010900@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just noticed a problem popping up sometimes with the new multi-threaded
pgbench. This is on a Linux RPM build (the alpha4 set) compiled with
'--disable-thread-safety'. Still trying to nail down whether that's a
requirement for this problem to appear or not. I did most of my review
of this feature with it turned on, and haven't been seeing this problem
on other systems that are thread safe. Not sure yet if that's cause and
effect or coincidence yet.

Here's a sample invocation that produces ragged output for me on my one
system:

pgbench -S -T 5 -c 4 -j 4 -l pgbench

The log file produced by this (pgbench_log.<pid>) is supposed to consist
of a series of lines in the following format:

client,trans,latency,filenum,sec,usec

It looks like the switch between clients running on separate workers can
lead to a mix of their respective lines showing up though. Here's a
couple of typical samples, with the bad line in the middle of each set:

1 138 178 0 1268665788 607559
1 139 182 0 1268665788 607751
1 1402 0 2491 0 1268665788 586135
2 1 264 0 1268665788 586463
2 2 192 0 1268665788 586665

1 274 160 0 1268665788 632966
1 275 178 0 1268665788 633154
1 276 184 0 126866578 178 0 1268665788 614015
2 141 190 0 1268665788 614252
2 142 169 0 1268665788 614430

2 274 178 0 1268665788 639218
2 275 175 0 1268665788 639402
2 276 169 0 126866578 171 0 1268665788 626933
0 141 185 0 1268665788 627165
0 142 202 0 1268665788 627377

Looks like sometimes a client is only getting part of its line written
out before getting stomped on by the next one. I think one of the
assumptions being made about how to safely write to this log file may be
broken by the multi-process implementation, which is what you get when
thread-safety is not available.

Since there should only be 6 fields here, I think you can find whether a
given log file has this problem or not like this:

cat pgbench_log.xxxxx | cut -d " " -f 7 | sort | uniq

If anything comes out of that, the latency log file has at least one bad
line in it.

Similarly, this:

cat pgbench_log.xxxxx | cut -d " " -f 1 | sort | uniq

Should only show the client numbers; here there's some first columns
with much bigger numbers too.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2010-03-15 15:49:50 Re: walreceiver is uninterruptible on win32
Previous Message Tom Lane 2010-03-15 15:37:55 Re: Dyamic updates of NEW with pl/pgsql