Re: pgbench bug candidate: negative "initial connection time"

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: coelho(at)cri(dot)ensmp(dot)fr
Cc: nagata(at)sraoss(dot)co(dot)jp, kuroda(dot)hayato(at)fujitsu(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgbench bug candidate: negative "initial connection time"
Date: 2021-06-18 08:26:03
Message-ID: 20210618.172603.409047378314636142.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 17 Jun 2021 11:52:04 +0200 (CEST), Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote in
> > Ok. I gave up to change the state in threadRun. Instead, I changed the
> > condition at the end of bench, which enables to report abortion due to
> > socket errors.
> >
> > +@@ -6480,7 +6490,7 @@ main(int argc, char **argv)
> > + #endif /* ENABLE_THREAD_SAFETY */
> > +
> > + for (int j = 0; j < thread->nstate; j++)
> > +- if (thread->state[j].state == CSTATE_ABORTED)
> > ++ if (thread->state[j].state != CSTATE_FINISHED)
> > + exit_code = 2;
> > +
> > + /* aggregate thread level stats */
> >
> > Does this make sense?
>
> Yes, definitely.

I sought for a simple way to enforce all client finishes with the
states abort or finished but I didn't find. So +1 for the
change. However, as a matter of style. if we touch the code maybe we
want to enclose the if statement.

Doing this means we regard any state other than CSTATE_FINISHED as
aborted. So, the current goto's to done in threadRun are effectively
aborting a part or the all clients running on the thread. So for
example the following place:

pgbench.c:6713
> /* must be something wrong */
> pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
> goto done;

Should say such like "thread %d aborted: %s() failed: ...".

I'm not sure what is the consensus here about the case where aborted
client can recoonect to the same server. This patch doesn't that. However, I think causing reconnection needs more work than accepted as a fix while beta.

====

+ /* as the bench is already running, we do not abort */
pg_log_error("client %d aborted while establishing connection", st->id);
st->state = CSTATE_ABORTED;

The comment looks strange that it is saying "we don't abort" while
setting the state to CSTATE_ABORT then showing "client %d aborted".

====
if ((con = doConnect()) == NULL)
+ {
+ pg_log_fatal("connection for initialization failed");
exit(1);

doConnect() prints an error emssage given from libpq. So The
additional messaget is redundant.

====
errno = THREAD_BARRIER_INIT(&barrier, nthreads);
if (errno != 0)
+ {
pg_log_fatal("could not initialize barrier: %m");
+ exit(1);

This is a run-time error. Maybe we should return 2 in that case.

===
if (thread->logfile == NULL)
{
pg_log_fatal("could not open logfile \"%s\": %m", logpath);
- goto done;
+ exit(1);

Maybe we should exit with 2 this case. If we exit this case, we might
also want to exit when fclose() fails. (Currently the error of
fclose() is ignored.)

===
+ /* coldly abort on connection failure */
+ pg_log_fatal("cannot create connection for thread %d client %d",
+ thread->tid, i);
+ exit(1);

It seems to me that the "thread %d client %d(not clinent id but the
client index within the thread)" doesn't make sense to users. Even if
we showed a message like that, it should show only the global clinent
id (cstate->id).

I think that we should return with 2 here but we return with 1
in another place for the same reason..

===
/* must be something wrong */
- pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
+ pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
goto done;

Why doesn't a fatal error cause an immediate exit? (And if we change
this to fatal, we also need to change similar errors in the same
function to fatal.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-06-18 08:31:00 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Michael Paquier 2021-06-18 06:59:32 Re: Refactor ECPGconnect and allow IPv6 connection