Re: pgbench bug candidate: negative "initial connection time"

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: nagata(at)sraoss(dot)co(dot)jp, kuroda(dot)hayato(at)fujitsu(dot)com, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench bug candidate: negative "initial connection time"
Date: 2021-06-18 12:54:27
Message-ID: alpine.DEB.2.22.394.2106181434110.3146194@pseudo
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello,

> Doing this means we regard any state other than CSTATE_FINISHED as
> aborted. So, the current goto's to done in threadRun are effectively
> aborting a part or the all clients running on the thread. So for
> example the following place:
>
> pgbench.c:6713
>> /* must be something wrong */
>> pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
>> goto done;
>
> Should say such like "thread %d aborted: %s() failed: ...".

Yep, possibly.

> I'm not sure what is the consensus here about the case where aborted
> client can recoonect to the same server. This patch doesn't that.

Non trivial future work.

> However, I think causing reconnection needs more work than accepted as
> a fix while beta.

It is an entire project which requires some thinking about.

> + /* as the bench is already running, we do not abort */
> pg_log_error("client %d aborted while establishing connection", st->id);
> st->state = CSTATE_ABORTED;
>
> The comment looks strange that it is saying "we don't abort" while
> setting the state to CSTATE_ABORT then showing "client %d aborted".

Indeed. There is abort from the client, which just means that it stops
sending transaction, and abort for the process, which is basically
"exit(1)".

> ====
> if ((con = doConnect()) == NULL)
> + {
> + pg_log_fatal("connection for initialization failed");
> exit(1);
>
> doConnect() prints an error emssage given from libpq. So The
> additional messaget is redundant.

This is not the same for me: doConnect may fail but we may decide to go
retry the connection later, or just one client may be disconnected but
others are going on, which is different from deciding to stop the whole
program, which deserves a message on its own.

> ====
> errno = THREAD_BARRIER_INIT(&barrier, nthreads);
> if (errno != 0)
> + {
> pg_log_fatal("could not initialize barrier: %m");
> + exit(1);
>
> This is a run-time error. Maybe we should return 2 in that case.

Hmmm. Yep.

> ===
> if (thread->logfile == NULL)
> {
> pg_log_fatal("could not open logfile \"%s\": %m", logpath);
> - goto done;
> + exit(1);
>
> Maybe we should exit with 2 this case.

Yep.

> If we exit this case, we might also want to exit when fclose() fails.
> (Currently the error of fclose() is ignored.)

Not sure. I'd let it at that for now.

> ===
> + /* coldly abort on connection failure */
> + pg_log_fatal("cannot create connection for thread %d client %d",
> + thread->tid, i);
> + exit(1);
>
> It seems to me that the "thread %d client %d(not clinent id but the
> client index within the thread)" doesn't make sense to users. Even if
> we showed a message like that, it should show only the global clinent
> id (cstate->id).

This is not obvious to me. I think that we should be homogeneous with what
is already done around.

> I think that we should return with 2 here but we return with 1
> in another place for the same reason..

Possibly.

> /* must be something wrong */
> - pg_log_fatal("%s() failed: %m", SOCKET_WAIT_METHOD);
> + pg_log_error("%s() failed: %m", SOCKET_WAIT_METHOD);
> goto done;
>
> Why doesn't a fatal error cause an immediate exit?

Good point. I do not know, but I would expect it to be the case, and
AFAICR it does not.

> (And if we change this to fatal, we also need to change similar errors
> in the same function to fatal.)

Possibly.

I'll look into it over the week-end.

Thanks for the feedback!

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-06-18 13:05:28 Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
Previous Message Alexander Pyhalov 2021-06-18 12:02:39 Re: Asymmetric partition-wise JOIN