Re: pgbench logging broken by time logic changes

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, david(dot)christensen(at)crunchydata(dot)com
Subject: Re: pgbench logging broken by time logic changes
Date: 2021-06-17 00:36:10
Message-ID: CA+hUKG+jhatozihmqy4=Ufd0OHwZadEd_vg-_+26hE3f4C3TgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Greg,

On Thu, Jun 17, 2021 at 2:49 AM Gregory Smith <gregsmithpgsql(at)gmail(dot)com> wrote:
> Back on March 10 Thomas Munro committed and wrestled multiple reworks of the pgbench code from Fabien and the crew. The feature to synchronize startup I'm looking forward to testing now that I have a packaged beta. Variations on that problem have bit me so many times I added code last year to my pgbench processing pipeline to just throw out the first and last 10% of every data set.

Yeah, commit aeb57af8 is a nice improvement and was the main thing I
wanted to get into the tree for 14 in this area, because it was
measuring the wrong thing.

> Before I could get to startup timing I noticed the pgbench logging output was broken via commit 547f04e7 "Improve time logic": https://www.postgresql.org/message-id/E1lJqpF-00064e-C6%40gemulon.postgresql.org

It does suck that we broke the logging and that it took 3 months for
anyone to notice and report it to the list. Seems like it should be
straightforward to fix, though, with fixes already proposed (though I
haven't studied them yet, will do).

> I have a lot of community oriented work backed up behind this right now, so I'm gonna be really honest. This time rework commit in its current form makes me uncomfortable at this point in the release schedule. The commit has already fought through two rounds of platform specific bug fixes. But since the buildfarm doesn't test the logging feature, that whole process is suspect.

It's true that this work produced a few rounds of small portability
follow-ups: c427de42 (work around strange hacks elsewhere in the tree
for AIX), 68b34b23 (missing calling convention specifier on Windows),
and de91c3b9 (adjust pthread missing-function code for threadless
builds). These were problems that didn't show up on developer or CI
systems (including threadless and Windows), and IMHO are typical sorts
of problems you expect to have to work through when stuff hits the
build farm, especially when using new system interfaces. So I don't
think any of that, on its own, supports reverting anything here.

> My take on the PostgreSQL way to proceed: this bug exposes that pgbench logging is a feature we finally need to design testing for. We need a new buildfarm test and then a march through a full release phase to see how it goes. Only then should we start messing with the time logic. Even if we fixed the source today on both my test platforms, I'd still be nervous that beta 2 could ship and more performance testing could fall over from this modification. And that's cutting things a little close.
>
> The fastest way to get me back to comfortable would be to unwind 547f04e7 and its associated fixes and take it back to review. I understand the intent and value; I appreciate the work so far. The big industry architecture shift from Intel to ARM has me worried about time overhead again, the old code is wonky, and in the PG15 release cycle I already have resources planned around this area.

Let me study the proposed fixes on this and the other thread about
pgbench logging for a bit.

Glad to hear that you're working on this area. I guess you might be
researching stuff along the same sorts of lines as in the thread
"Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?" (though
that's about the executor). As I already expressed in that thread, if
the backend's instrumentation code is improved as proposed there,
we'll probably want to rip some of these pgbench changes out anyway
and go back to common instrumentation code.

For that reason, I'm not super attached to that new pg_time_usec_t
stuff at all, and wouldn't be sad if we reverted that piece. I am
moderately attached to the sync changes, though. pgbench 13 is
objectively producing incorrect results in that respect.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-06-17 00:46:42 Re: pgbench logging broken by time logic changes
Previous Message Tomas Vondra 2021-06-17 00:23:17 Re: PoC: Using Count-Min Sketch for join cardinality estimation