Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date: 2020-09-21 03:44:30
Message-ID: CALj2ACXG0N4mH3EADHruoV1UfUHTcm2=8Q06ynJPNb3+zxmnLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 17, 2020 at 10:20 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
wrote:
>
> In 1st way, you may also need to call ReleaseExternalFD() when new
connection fails
> to be made, as connect_pg_server() does. Also we need to check that
> non-superuser has used password to make new connection,
> as connect_pg_server() does? I'm concerned about the case where
> pg_hba.conf is changed accidentally so that no password is necessary
> at the remote server and the existing connection is terminated. In this
case,
> if we connect to the local server as non-superuser, connection to
> the remote server should fail because the remote server doesn't
> require password. But with your patch, we can successfully reconnect
> to the remote server.
>
> Therefore I like 2nd option. Also maybe disconnect_ps_server() needs to
> be called before that. I'm not sure how much useful 1st option is.
>

Thanks. Above points look sensible. +1 for the 2nd option i.e. instead of
PQreset(entry->conn);, let's try to disconnect_pg_server() and
connect_pg_server().

>
> What if 2nd attempt happens with have_prep_stmt=true? I'm not sure
> if this case really happens, though. But if that can, it's strange to
start
> new connection with have_prep_stmt=true even when the caller of
> GetConnection() doesn't intend to create any prepared statements.
>
> I think it's safer to do 2nd attempt in the same way as 1st one. Maybe
> we can simplify the code by making them into common code block
> or function.
>

I don't think the have_prep_stmt will be set by the time we make 2nd
attempt because entry->have_prep_stmt |= will_prep_stmt; gets hit only
after we are successful in either 1st attempt or 2nd attempt. I think we
don't need to set all transient state. No other transient state except
changing_xact_state changes from 1st attempt to 2nd attempt(see below), so
let's set only entry->changing_xact_state to false before 2nd attempt.

1st attempt:
(gdb) p *entry
$3 = {key = 16389, conn = 0x55a896199990, xact_depth = 0, have_prep_stmt =
false,
have_error = false, changing_xact_state = false, invalidated = false,
server_hashvalue = 3905865521, mapping_hashvalue = 2617776010}

2nd attempt i.e. in retry block:
(gdb) p *entry
$4 = {key = 16389, conn = 0x55a896199990, xact_depth = 0, have_prep_stmt =
false,
have_error = false, changing_xact_state = true, invalidated = false,
server_hashvalue = 3905865521, mapping_hashvalue = 2617776010}

>>
> > If an error occurs in the first attempt, we return from
> > pgfdw_get_result()'s if (!PQconsumeInput(conn)) to the catch block we
> > added and pgfdw_report_error() will never get called. And the below
> > part of the code is reached only in scenarios as mentioned in the
> > comments. Removing this might have problems if we receive errors other
> > than CONNECTION_BAD or for subtxns. We could clear if any result and
> > just rethrow the error upstream. I think no problem having this code
> > here.
>
> But the orignal code works without this?
> Or you mean that the original code has the bug?
>

There's no bug in the original code. Sorry, I was wrong in saying
pgfdw_report_error() will never get called with this patch. Indeed it gets
called even when 1's attempt connection is failed. Since we added an extra
try-catch block, we will not be throwing the error to the user, instead we
make a 2nd attempt from the catch block.

I'm okay to remove below part of the code

> >> + PGresult *res = NULL;
> >> + res = PQgetResult(entry->conn);
> >> + PQclear(res);
> >> Are these really necessary? I was just thinking that's not because
> >> pgfdw_get_result() and pgfdw_report_error() seem to do that
> >> already in do_sql_command().

Please let me know if okay with the above agreed points, I will work on the
new patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-21 03:48:24 Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
Previous Message Wang, Shenhao 2020-09-21 03:43:41 make MaxBackends available in _PG_init