Re: pgbnech: allow to cancel queries during benchmark

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbnech: allow to cancel queries during benchmark
Date: 2023-08-09 09:06:24
Message-ID: a58388ac-5411-4760-ea46-71324d8324cb@mines-paristech.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Yugo-san,

Some feedback about v2.

There is some dead code (&& false) which should be removed.

>>>> Maybe it should check that cancel is not NULL before calling PQcancel?
>>>
>>> I think this is already checked as below, but am I missing something?
>>>
>>> + if (all_state[i].cancel != NULL)
>>> + (void) PQcancel(all_state[i].cancel, errbuf, sizeof(errbuf));
>>>
>>>> After looking at the code, I'm very unclear whether they may be some
>>>> underlying race conditions, or not, depending on when the cancel is
>>>> triggered. I think that some race conditions are still likely given the
>>>> current thread 0 implementation, and dealing with them with a barrier or
>>>> whatever is not desirable at all.
>>>>
>>>> In order to work around this issue, ISTM that we should go away from the
>>>> simple and straightforward thread 0 approach, and the only clean way is
>>>> that the cancelation should be managed by each thread for its own client.
>>>>
>>>> I'd suggest to have the advanceState to call PQcancel when CancelRequested
>>>> is set and switch to CSTATE_ABORTED to end properly. This means that there
>>>> would be no need for the global client states pointer, so the patch should
>>>> be smaller and simpler. Possibly there would be some shortcuts added here
>>>> and there to avoid lingering after the control-C, in threadRun.
>>>
>>> I am not sure this approach is simpler than mine.

My argument is more about latent race conditions and inter-thread
interference than code simplicity.

>>> In multi-threads, only one thread can catches the signal and other threads
>>> continue to run.

Yep. This why I see plenty uncontrolled race conditions if thread 0
cancels clients which are managed in parallel by other threads and may be
active. I'm not really motivated to delve into libpq internals to check
whether there are possibly bad issues or not, but if two threads write
message at the same time in the same socket, I assume that this can be
bad if you are unlucky.

ISTM that the rational convention should be that each thread cancels its
own clients, which ensures that there is no bad interaction between
threads.

>>> Therefore, if Ctrl+C is pressed while threads are waiting
>>> responses from the backend in wait_on_socket_set, only one thread can be
>>> interrupted and return, but other threads will continue to wait and cannot
>>> check CancelRequested.

>>> So, for implementing your suggestion, we need any hack
>>> to make all threads return from wait_on_socket_set when the event occurs, but
>>> I don't have idea to do this in simpler way.

>>> In my patch, all threads can return from wait_on_socket_set at Ctrl+C
>>> because when thread #0 cancels all connections, the following error is
>>> sent to all sessions:
>>>
>>> ERROR: canceling statement due to user request
>>>
>>> and all threads will receive the response from the backend.

Hmmm.

I understand that the underlying issue you are raising is that other
threads may be stuck while waiting on socket events and that with your
approach they will be cleared somehow by socket 0.

I'll say that (1) this point does not address potential race condition
issues with several thread interacting directly on the same client ;
(2) thread 0 may also be stuck waiting for events so the cancel is only
taken into account when it is woken up.

If we accept that each thread cancels its clients when it is woken up,
which may imply some (usually small) delay, then it is not so different
from the current version because the code must wait for 0 to wake up
anyway, and it solves (1). The current version does not address potential
thread interactions.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2023-08-09 09:11:55 Re: pgsql: Ignore BRIN indexes when checking for HOT udpates
Previous Message Michael Paquier 2023-08-09 08:44:49 Re: Incorrect handling of OOM in WAL replay leading to data loss