Re: statement_timeout is not working as expected with postgres_fdw

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc: suraj(dot)kharage(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: statement_timeout is not working as expected with postgres_fdw
Date: 2017-04-25 08:01:03
Message-ID: 20170425.170103.200825676.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Thu, 20 Apr 2017 19:57:30 +0530, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote in <CAFjFpRdMD25Lia_J0P1eiKzNXAXtH9AGNx2vCZ2dBPNffNOaog(at)mail(dot)gmail(dot)com>
> On Thu, Apr 20, 2017 at 4:08 PM, Suraj Kharage
> <suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
> > Hello All,
> >
> > The query on foreign table hangs due to network down of remote server for
> > near about 16 minutes before exiting.
> > statement_timeout is expected to work in this case as well but when i tried
> > statement_timeout, it is not working as expected.
> >
> > Below is test case to reproduce the issue: [I am testing this on two
> > different systems]
> >
> > 1: Set the postgres_fdw
> >
> > postgres=# CREATE SERVER myserver FOREIGN DATA WRAPPER postgres_fdw OPTIONS
> > (host '172.16.173.237', dbname 'postgres', port '5432',keepalives '1',
> > keepalives_interval '3',keepalives_idle '3', keepalives_count '1');
> > CREATE SERVER
> > postgres=# CREATE USER MAPPING FOR postgres SERVER myserver OPTIONS (user
> > 'postgres', password 'edb');
> > CREATE USER MAPPING
> > postgres=# CREATE FOREIGN TABLE test(id int) SERVER myserver;
> > CREATE FOREIGN TABLE
> > postgres=# select * from test;
> > id
> > ----
> > 1
> > 10
> > 2
> > 1
> > (4 rows)
> >
> > postgres=# \d
> > List of relations
> > Schema | Name | Type | Owner
> > --------+------+---------------+----------
> > public | test | foreign table | postgres
> > (1 row)
> > postgres=#
> > postgres=# select * from pg_foreign_server ;
> > srvname | srvowner | srvfdw | srvtype | srvversion | srvacl |
> > srvoptions
> >
> > ----------+----------+--------+---------+------------+--------+------------------------------------------------------------------------------------------------------------------
> > -------
> > myserver | 10 | 16387 | | | |
> > {host=172.16.173.237,dbname=postgres,port=5432,keepalives=1,keepalives_interval=3,keepalives_idle=3,keepalives_co
> > unt=1}
> > (1 row)
> >
> > 3. Execute the insert command.
> > E.g: insert into test values (generate_series(1,1000000));
> > * You need to do network down of remote server during insert command.
> >
> > postgres=# set statement_timeout to 6000;
> > SET
> > postgres=# insert into test values (generate_series(1,1000000));
> > WARNING: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> >
> > CONTEXT: Remote SQL command: ABORT TRANSACTION
> > ERROR: could not receive data from server: No route to host
> >
> > CONTEXT: Remote SQL command: INSERT INTO public.test(id) VALUES ($1)
> > Time: 931427.002 ms
> >
> > It was in hang state for approx 15-16 minute before exit.
> >
> > In pg_log, i see below error for above query:
> > =========
> > 2017-04-20 15:22:02 IST ERROR: could not receive data from server: No route
> > to host
> > 2017-04-20 15:22:02 IST CONTEXT: Remote SQL command: INSERT INTO
> > public.test(id) VALUES ($1)
> > 2017-04-20 15:22:02 IST STATEMENT: insert into test values
> > (generate_series(1,1000000));
> > 2017-04-20 15:22:02 IST WARNING: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > 2017-04-20 15:22:02 IST CONTEXT: Remote SQL command: ABORT TRANSACTION
> > ==========
> >
> > I tested the patch submitted by Ashutosh Bapat on community
> > [https://www.postgresql.org/message-id/CAFjFpRdcWw4h0a-zrL-EiaekkPj8O0GR2M1FwZ1useSRfRm3-g%40mail.gmail.com]
> > to make the statement_timeout working and i can see statement_timeout is
> > working but it is taking some extra time as statement_timeout.
> >
>
> Thanks for testing the patch.
>
> > postgres=# set statement_timeout to 6000;
> > SET
> > postgres=#
> > postgres=# \timing on
> > Timing is on.
> > postgres=# insert into test values (generate_series(1,1000000));
> > -- [after executing query immediately disconnect the ethernet on remote
> > server ]
> > 2017-04-20 07:10:51.588 IST [10467] ERROR: canceling statement due to
> > statement timeout
> > 2017-04-20 07:10:51.588 IST [10467] STATEMENT: insert into test values
> > (generate_series(1,1000000));
> > 2017-04-20 07:11:25.590 IST [10467] WARNING: discarding connection 0xfe4260
> > because of failed cancel request: PQcancel() -- connect() failed: No route
> > to host
> > WARNING: discarding connection 0xfe4260 because of failed cancel request:
> > PQcancel() -- connect() failed: No route to host
> >
> > ERROR: canceling statement due to statement timeout
> > Time: 40001.765 ms (00:40.002)
> > postgres=#
> > postgres=#
> >
> > In above case, I got the error related to statement timeout after 6 seconds,
> > but it it taking more time (approx 34 sec and it is varing each time if you
> > see below) to terminate the connection and to exit from query. As per the
> > tcp keepavlies settings for foreign server, it should take max 6 sec to
> > terminate the connection.
>
> The logs above show that 34 seconds elapsed between starting to abort
> the transaction and knowing that the foreign server is unreachable. It
> looks like it took that much time for the local server to realise that
> the foreign server is not reachable. Looking at PQcancel code, it
> seems to be trying to connect to the foreign server to cancel the
> query. But somehow it doesn't seem to honor connect_timeout setting.
> Is that expected?

Yes, and No. I think PQcancel requires connection timeout, but I
think it is not necessariry the same with that of a foreign
server.

> Irrespective of what PQcancel does, it looks like postgres_fdw should
> just slam the connection if query is being aborted because of
> statement_timeout. But then pgfdw_xact_callback() doesn't seem to have
> a way to know whether this ABORT is because of user's request to
> cancel the query, statement timeout, an abort because of some other
> error or a user requested abort. Except statement timeout (may be
> user's request to cancel the query?), it should try to keep the
> connection around to avoid any future reconnection. But I am not able
> to see how can we provide that information to pgfdw_xact_callback().

Expiration of statement_timeout doesn't mean a stall of foreign
connections. If we are to keep connections by, for example, a
cancel request from client, we also should keep them on
statememt_timeout because it is not necessariry triggered by a
stall of foreign connection.

I think we can detect a stall of the channel where the foreign
connections are on by a cancel request with a very short timeout,
although it is a bit incorrect.

I reconsider this problem and my proposal for this issue is as
the follows.

- Foreign servers have a new options 'stall_detection_threshold'
in milliseconds, maybe defaults to connect_timeout of the
foreign server setting. For many foreign servers in a local
network, it could be lowered to several tens of milliseconds.

-+ Invoke PQcancel with the stall_detection_threashold. (It
| doesn't accept such parameter so far)
|
+- If PQcancel returns with timeout, slam the connection down.
|
+- Else we continue to send ABORT TRANSACTION to the connection.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-04-25 08:17:39 Re: some review comments on logical rep code
Previous Message Amit Langote 2017-04-25 07:21:53 Re: warning in twophase.c