Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date: 2020-10-02 18:00:38
Message-ID: de2e1392-b1c3-592a-34c0-d0925d5ab91f@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020/10/02 0:46, Bharath Rupireddy wrote:
> On Thu, Oct 1, 2020 at 8:10 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>
>> pg_stat_clear_snapshot() can be used to reset the entry.
>>
>
> Thanks. I wasn't knowing it.
>
>>
>> + EXIT WHEN proccnt = 0;
>> + END LOOP;
>>
>> Isn't it better to sleep here, to avoid th busy loop?
>>
>
> +1.
>
>>
>> So what I thought was something like
>>
>> CREATE OR REPLACE PROCEDURE wait_for_backend_termination()
>> LANGUAGE plpgsql
>> AS $$
>> BEGIN
>> LOOP
>> PERFORM * FROM pg_stat_activity WHERE application_name = 'fdw_retry_check';
>> EXIT WHEN NOT FOUND;
>> PERFORM pg_sleep(1), pg_stat_clear_snapshot();
>> END LOOP;
>> END;
>> $$;
>>
>
> Changed.
>
> Attaching v8 patch, please review it.. Both make check and make
> check-world passes on v8.

Thanks for updating the patch! It basically looks good to me.
I tweaked the patch as follows.

+ if (!entry->conn ||
+ PQstatus(entry->conn) != CONNECTION_BAD ||

With the above change, if entry->conn is NULL, an error is thrown and no new
connection is reestablished. But why? IMO it's more natural to reestablish
new connection in that case. So I removed "!entry->conn" from the above
condition.

+ ereport(DEBUG3,
+ (errmsg("could not start remote transaction on connection %p",
+ entry->conn)),

I replaced errmsg() with errmsg_internal() because the translation of
this debug message is not necessary.

+SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
+backend_type = 'client backend' AND application_name = 'fdw_retry_check';
+CALL wait_for_backend_termination();

Since we always use pg_terminate_backend() and wait_for_backend_termination()
together, I merged them into one function.

I simplied the comments on the regression test.

Attached is the updated version of the patch. If this patch is ok,
I'd like to mark it as ready for committer.

> I have another question not related to this patch: though we have
> wait_pid() function, we are not able to use it like
> pg_terminate_backend() in other modules, wouldn't be nice if we can
> make it generic under the name pg_wait_pid() and usable across all pg
> modules?

I thought that, too. But I could not come up with good idea for *real* use case
of that function. At least that's useful for the regression test, though.
Anyway, IMO it's worth proposing that and hearing more opinions about that
from other hackers.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachment Content-Type Size
v9-Retry-Cached-Remote-Connections-For-postgres_fdw.patch text/plain 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-10-02 18:22:18 Re: buildfarm animal shoveler failing with "Illegal instruction"
Previous Message Kuntal Ghosh 2020-10-02 17:56:05 Incorrect assumption in heap_prepare_freeze_tuple