pgbench logging broken by time logic changes

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: david(dot)christensen(at)crunchydata(dot)com
Subject: pgbench logging broken by time logic changes
Date: 2021-06-16 14:49:36
Message-ID: CAHLJuCW_8Vpcr0=t6O_gozrg3wXXWXZXDioYJd3NhvKriqgpfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Back on March 10 Thomas Munro committed and wrestled multiple reworks of
the pgbench code from Fabien and the crew. The feature to synchronize
startup I'm looking forward to testing now that I have a packaged beta.
Variations on that problem have bit me so many times I added code last year
to my pgbench processing pipeline to just throw out the first and last 10%
of every data set.

Before I could get to startup timing I noticed the pgbench logging output
was broken via commit 547f04e7 "Improve time logic":
https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org

A lot of things are timed in pgbench now so I appreciate the idea. Y'all
started that whole big thread about sync on my birthday though and I didn't
follow the details of what that was reviewed against. For the logging use
case I suspect it's just broken everywhere. The two platforms I tested
were PGDG Ubuntu beta1 apt install and Mac git build. Example:

$ createdb pgbench
$ pgbench -i -s 1 pgbench
$ pgbench -S -T 1 -l pgbench
$ head pgbench_log.*
0 1 1730 0 1537380 70911
0 2 541 0 1537380 71474

The epoch time is the 5th column in the output, and this week it should
look like this:

0 1 1411 0 1623767029 732926
0 2 711 0 1623767029 733660

If you're not an epoch guru who recognizes what's wrong already, you might
grab https://github.com/gregs1104/pgbench-tools/ and party like it's 1970
to see it:

$ ~/repos/pgbench-tools/log-to-csv 1 local < pgbench_log* | head
1970-01-18 14:03:00.070911,0,1.73,1,local
1970-01-18 14:03:00.071474,0,0.541,1,local

I have a lot of community oriented work backed up behind this right now, so
I'm gonna be really honest. This time rework commit in its current form
makes me uncomfortable at this point in the release schedule. The commit
has already fought through two rounds of platform specific bug fixes. But
since the buildfarm doesn't test the logging feature, that whole process is
suspect.

My take on the PostgreSQL way to proceed: this bug exposes that pgbench
logging is a feature we finally need to design testing for. We need a new
buildfarm test and then a march through a full release phase to see how it
goes. Only then should we start messing with the time logic. Even if we
fixed the source today on both my test platforms, I'd still be nervous that
beta 2 could ship and more performance testing could fall over from this
modification. And that's cutting things a little close.

The fastest way to get me back to comfortable would be to unwind 547f04e7
and its associated fixes and take it back to review. I understand the
intent and value; I appreciate the work so far. The big industry
architecture shift from Intel to ARM has me worried about time overhead
again, the old code is wonky, and in the PG15 release cycle I already have
resources planned around this area.

# PG15 Plans

I didn't intend to roll back in after time away and go right to a revert
review. But I also really don't want to start my public PG14 story
documenting the reality that I had to use PG13's pgbench to generate my
examples either. I can't fight much with this logging problem while also
doing my planned public performance testing of PG14. I already had to push
back a solid bit of Beta 1 PR from this week, some "community PG is great!"
promotional blogging.

Let me offer what I can commit to from Crunchy corporate. I'm about to
submit multiple pgbench feature changes to the open CF starting July, with
David Christiansen. We and the rest of Crunchy will happily help re-review
this time change idea, its logging issues, testing, rejoin the study of
platform time call overhead, and bash the whole mess into shape for PG15.
I personally am looking forward to it.

The commit made a functional change to the way connection time is
displayed; that I can take or leave as committed. I'm not sure it can be
decoupled from the rest of the changes. It did cause a small breaking
pgbench output parsing problem for me, just trivial regex adjustment. That
break would fit in fine with my upcoming round of submissions.

--
Greg Smith greg(dot)smith(at)crunchydata(dot)com
Director of Open Source Strategy, Crunchy Data

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-06-16 15:14:55 Re: Improving isolationtester's data output
Previous Message Tom Lane 2021-06-16 14:48:24 Re: Decoding speculative insert with toast leaks memory