Re: Add client connection check during the execution of the query

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Maksim Milyutin <milyutinma(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Add client connection check during the execution of the query
Date: 2021-03-29 21:00:13
Message-ID: CA+hUKGLyj5Aqt6ojYfSc+qSeB1x=3RbU61hnus5sL0BKqEBsLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 30, 2021 at 6:25 AM Maksim Milyutin <milyutinma(at)gmail(dot)com> wrote:
>
> Hi Thomas! Thanks for working on this patch.
>
> I have attached a new version with some typo corrections of doc entry,
> removing of redundant `include` entries and trailing whitespaces. Also I
> added in doc the case when single query transaction with disconnected
> client might be eventually commited upon completion in autocommit mode
> if it doesn't return any rows (doesn't communicate with user) before
> sending final commit message. This behavior might be unexpected for
> clients and hence IMO it's worth noticing.

Thanks!

> > + * pq_recvbuf_ext - load some bytes into the input buffer

> AFAICS, the above fragment is not related with primary fix directly.

Right, that was some stuff left over from the v7 patch that I forgot to remove.

> AFAICS, there are the following open items that don't allow to treat the
> current patch completed:
>
> * Absence of tap tests emulating some scenarios of client disconnection.
> IIRC, you wanted to rewrite the test case posted by Sergey.

Yeah, it's a bit tricky to write a test that won't fail randomly on
slow animals, but I think I see how now (I think you need to send
SELECT pg_sleep(60), then in another connection, wait until that shows
as running in pg_stat_activity, and then kill the first process, and
then wait until the log contains a disconnected-for-this-reason
message). If we can answer the big question below, I'll invest the
time to do this.

> * Concerns about portability of `pq_check_connection()`A implementation.

Yes. This is the main problem for me. I just did a bit of testing to
refresh my memory of what doesn't work. On FreeBSD, with TCP
keepalives enabled (I tested with tcp_keepalives_idle=1s,
tcp_keepalives_count=5, tcp_keepalives_interval=1s to make it
aggressive), the POLLUP patch works if you pull out the ethernet
cable. But... it doesn't work if you kill -9 the remote psql process
(the remote kernel sends a FIN). So this is a patch that only works
the way we need it to work on Linux. Other OSes can only tell you
about this condition when you try to read or write.

> BTW, on windows postgres with this patch have not been built [1].

Huh, the manual says they have struct pollfd, and we include
winsock2.h. But I guess it doesn't work reliably on Windows anyway.

https://docs.microsoft.com/en-us/windows/win32/api/winsock2/ns-winsock2-wsapollfd

> * Absence of benchmark results to show lack of noticeable performance
> regression after applying non-zero timeout for checking client liveness.

True, we should do that, but I'll be very surprised if it's still a
problem: the regression was caused by calling setitimer() for every
query, but we fixed that in commit 09cf1d52.

If we want to ship this in v14 we have to make a decision ASAP:

1. Ship the POLLHUP patch (like v9) that only works reliably on
Linux. Maybe disable the feature completely on other OSes?
2. Ship the patch that tries to read (like v7). It should work on
all systems, but it can be fooled by pipelined commands (though it can
detect a pipelined 'X').

Personally, I lean towards #2.

A third option was mentioned, but we have no patch:

3. We could try to *send* a message. This should work on all
systems. Unfortunately our protocol doesn't seem to have an existing
"no-op" or "heartbeat" message we could use for this, and sending a
NOTICE would be noisy. I suspect this would require some protocol
evolution, and there is no time for that in v14.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-03-29 21:05:46 Re: psql lacking clearerr()
Previous Message Joel Jacobson 2021-03-29 20:59:38 Re: Idea: Avoid JOINs by using path expressions to follow FKs