Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marina Polyakova <m(dot)polyakova(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Date: 2021-06-26 10:15:38
Message-ID: alpine.DEB.2.22.394.2106231053090.1117163@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Yugo-san,

# About v12.2

## Compilation

Patch seems to apply cleanly with "git apply", but does not compile on my
host: "undefined reference to `conditional_stack_reset'".

However it works better when using the "patch". I'm wondering why git
apply fails silently…

When compiling there are warnings about "pg_log_fatal", which does not
expect a FILE* on pgbench.c:4453. Remove the "stderr" argument.

Global and local checks ok.

> number of transactions failed: 324 (3.240%)
> ...
> number of transactions retried: 5629 (56.290%)
> number of total retries: 103299

I'd suggest: "number of failed transactions". "total number of retries" or
just "number of retries"?

## Feature

The overall code structure changes to implements the feature seems
reasonable to me, as we are at the 12th iteration of the patch.

Comments below are somehow about details and asking questions
about choices, and commenting…

## Documentation

There is a lot of documentation, which is good. I'll review these
separatly. It looks good, but having a native English speaker/writer
would really help!

Some output examples do not correspond to actual output for
the current version. In particular, there is always one TPS figure
given now, instead of the confusing two shown before.

## Comments

transactinos -> transactions.

## Code

By default max_tries = 0. Should not the initialization be 1,
as the documentation argues that it is the default?

Counter comments, missing + in the formula on the skipped line.

Given that we manage errors, ISTM that we should not necessarily
stop on other not retried errors, but rather count/report them and
possibly proceed. Eg with something like:

-- server side random fail
DO LANGUAGE plpgsql $$
BEGIN
IF RANDOM() < 0.1 THEN
RAISE EXCEPTION 'unlucky!';
END IF;
END;
$$;

Or:

-- client side random fail
BEGIN;
\if random(1, 10) <= 1
SELECT 1 +;
\else
SELECT 2;
\endif
COMMIT;

We could count the fail, rollback if necessary, and go on. What do you think?
Maybe such behavior would deserve an option.

--report-latencies -> --report-per-command: should we keep supporting
the previous option?

--failures-detailed: if we bother to run with handling failures, should
it always be on?

--debug-errors: I'm not sure we should want a special debug mode for that,
I'd consider integrating it with the standard debug, or just for development.
Also, should it use pg_log_debug?

doRetry: I'd separate the 3 no retries options instead of mixing max_tries and
timer_exceeeded, for clarity.

Tries vs retries: I'm at odds with having tries & retries and + 1 here
and there to handle that, which is a little bit confusing. I'm wondering whether
we could only count "tries" and adjust to report what we want later?

advanceConnectionState: ISTM that ERROR should logically be before others which
lead to it.

Variables management: it looks expensive, with copying and freeing variable arrays.
I'm wondering whether we should think of something more clever. Well, that would be
for some other patch.

"Accumulate the retries" -> "Count (re)tries"?

Currently, ISTM that the retry on error mode is implicitely always on.
Do we want that? I'd say yes, but maybe people could disagree.

## Tests

There are tests, good!

I'm wondering whether something simpler could be devised to trigger
serialization or deadlock errors, eg with a SEQUENCE and an \if.

See the attached files for generating deadlocks reliably (start with 2 clients).
What do you think? The PL/pgSQL minimal, it is really client-code
oriented.

Given that deadlocks are detected about every seconds, the test runs
would take some time. Let it be for now.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2021-06-26 10:48:43 Re: Using each rel as both outer and inner for JOIN_ANTI
Previous Message Amit Kapila 2021-06-26 10:13:52 Re: subscription/t/010_truncate.pl failure on desmoxytes in REL_13_STABLE