Re: pgbench and timestamps (bounced)

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>, Jaime Soler <jaime(dot)soler(at)gmail(dot)com>
Subject: Re: pgbench and timestamps (bounced)
Date: 2020-09-11 13:59:12
Message-ID: alpine.DEB.2.22.394.2009111501410.3995562@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tom,

>> It requires a mutex around the commands, I tried to do some windows
>> implementation which may or may not work.
>
> Ugh, I'd really rather not do that. Even disregarding the effects
> of a mutex, though, my initial idea for fixing this has a big problem:
> if we postpone PREPARE of the query until first execution, then it's
> happening during timed execution of the benchmark scenario and thus
> distorting the timing figures. (Maybe if we'd always done it like
> that, it'd be okay, but I'm quite against changing the behavior now
> that it's stood for a long time.)

Hmmm.

Prepare is done *once* per client, ISTM that the impact on any
statistically significant benchmark is nul in practice, or it would mean
that the benchmark settings are too low.

Second, the mutex is only used when absolutely necessary, only for the
substitution part of the query (replacing :stuff by ?), because scripts
are shared between threads. This is just once, in an unlikely case
occuring at the beginning.

> However, perhaps there's more than one way to fix this. Once we've
> scanned all of the script and seen all the \set commands, we know
> (in principle) the set of all variable names that are in use.
> So maybe we could fix this by
>
> (1) During the initial scan of the script, make variable-table
> entries for every \set argument, with the values shown as undefined
> for the moment. Do not try to parse SQL commands in this scan,
> just collect them.

The issue with this approach is

SELECT 1 AS one \gset pref_

which will generate a "pref_one" variable, and these names cannot be
guessed without SQL parsing and possibly execution. That is why the
preparation is delayed to when the variables are actually known.

> (2) Make another scan in which we identify variable references
> in the SQL commands and issue PREPAREs (if enabled).

> (3) Perform the timed run.
>
> This avoids any impact of this bug fix on the semantics or timing
> of the benchmark proper. I'm not sure offhand whether this
> approach makes any difference for the concerns you had about
> identifying/suppressing variable references inside quotes.

I do not think this plan is workable, because of the \gset issue.

I do not see that the conditional mutex and delayed PREPARE would have any
significant (measurable) impact on an actual (reasonable) benchmark run.

A workable solution would be that each client actually execute each script
once before starting the actual benchmark. It would still need a mutex and
also a sync barrier (which I'm proposing in some other thread). However
this may raise some other issues because then some operations would be
trigger out of the benchmarking run, which may or may not be desirable.

So I'm not to keen to go that way, and I think the proposed solution is
reasonable from a benchmarking point of view as the impact is minimal,
although not zero.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-09-11 14:04:41 Re: track_planning causing performance regression
Previous Message Fabien COELHO 2020-09-11 12:58:51 Re: Missing "Up" navigation link between parts and doc root?