Re: Add client connection check during the execution of the query

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: thomas(dot)munro(at)gmail(dot)com
Cc: s(dot)kelvich(at)postgrespro(dot)ru, ishii(at)sraoss(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add client connection check during the execution of the query
Date: 2019-07-18 03:19:06
Message-ID: 20190718.121906.432767475415875714.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Yeah.
>
> +1 for this patch, with a few adjustments including making the test
> use pg_sleep() as mentioned. It does something useful, namely
> cancelling very long running queries sooner if the client has gone
> away instead of discovering that potentially much later when sending a
> response. It does so with a portable kernel interface (though we
> haven't heard from a Windows tester), and I think it's using it in a
> safe way (we're not doing the various bad things you can do with
> MSG_PEEK, and the fd is expected to be valid for the process's
> lifetime, and the socket is always in non-blocking mode*, so I don't
> think there is any bad time for pg_check_client_connection() to run).
> It reuses the existing timer infrastructure so there isn't really any
> new overhead. One syscall every 10 seconds or whatever at the next
> available CFI is basically nothing. On its own, this patch will
> reliably detect clients that closed abruptly or exited/crashed (so
> they client kernel sends a FIN packet). In combination with TCP
> keepalive, it'll also detect clients that went away because the
> network or client kernel ceased to exist.
>
> *There are apparently no callers of pg_set_block(), so if you survived
> pq_init() you have a non-blocking socket. If you're on Windows, the
> code always sets the magic pgwin32_noblock global flag before trying
> to peek. I wondered if it's OK that the CFI would effectively clobber
> that with 0 on its way out, but that seems to be OK because every
> place in the code that sets that flag does so immediately before an IO
> operationg without a CFI in between. As the comment in pgstat.c says
> "This is extremely broken and should be fixed someday.". I wonder if
> we even need that flag at all now that all socket IO is non-blocking.

I have looked into the patch and tested a little bit.

First of all, I had to grab February snapshot to test the patch
because it does not apply to the current HEAD. I noticed that there
are some confusions in the doc and code regarding what the new
configuration parameter means. According to the doc:

+ Default value is <literal>zero</literal> - it disables connection
+ checks, so the backend will detect client disconnection only when trying
+ to send a response to the query.

But guc.c comment says:

+ gettext_noop("Sets the time interval for checking connection with the client."),
+ gettext_noop("A value of -1 disables this feature. Zero selects a suitable default value."),

Probably the doc is correct since the actual code does so.

Also I found this in postgresql.conf default:

#client_connection_check_interval = 1000 # set time interval between

So here the default value seems to be be 1000. If so, guc.c should be
adjusted and the doc should be changed accordingly. I am not sure.

Next I have tested the patch using standard pgbench.

With the feature enabled with 1000ms check interval:
$ pgbench -c 10 -T 300 -S test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 19347995
latency average = 0.155 ms
tps = 64493.278581 (including connections establishing)
tps = 64493.811945 (excluding connections establishing)

Without the feature (client-connection-check-interval = 0)
$ pgbench -c 10 -T 300 -S test
starting vacuum...end.
transaction type: <builtin: select only>
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 1
duration: 300 s
number of transactions actually processed: 20314812
latency average = 0.148 ms
tps = 67715.993428 (including connections establishing)
tps = 67717.251843 (excluding connections establishing)

So the performance is about 5% down with the feature enabled in this
case. For me, 5% down is not subtle. Probably we should warn this in
the doc.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ryan Lambert 2019-07-18 03:26:11 Re: dropdb --force
Previous Message Masahiko Sawada 2019-07-18 03:04:25 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)