Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up

From: jiye <jiye_sw(at)126(dot)com>
To: "Etsuro Fujita" <etsuro(dot)fujita(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up
Date: 2026-04-03 03:13:19
Message-ID: 705358bc.2ead.19d5154b82f.Coremail.jiye_sw@126.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

We have successfully reproduced this issue and gained a clearer understanding of its root cause. The application uses a cursor to fetch partial results in batches, with a delay between consecutive fetch operations. When the interval between two batches exceeds the tcp_user_timeout threshold, the connection is terminated unexpectedly.

In my analysis, during cursor-based queries, applications typically retrieve results in partial batches. If the number of rows fetched in a single batch is smaller than the number of rows scanned from the local table, the executor is unable to proceed with fetching rows from the foreign table. While we have attempted workarounds such as adjusting the fetch size, tuning TCP buffer parameters, and modifying the tcp_user_timeout value, these measures only mitigate the symptoms without addressing the underlying problem.

To achieve a fundamental resolution, I propose two potential solutions:

‌Alternate Row Fetching‌: Modify the executor to alternately retrieve rows from the local table and the foreign table, ensuring balanced data flow between the two data sources.
‌Asynchronous Tuple Storage‌: Implement a tuple storage mechanism to asynchronously cache results from the foreign table. This would allow the executor to fetch foreign table results into the storage buffer independently, preventing TCP window exhaustion and decoupling the dependency between local and foreign data retrieval.

At 2026-03-11 16:01:02, "Etsuro Fujita" <etsuro(dot)fujita(at)gmail(dot)com> wrote:
>On Wed, Mar 11, 2026 at 3:25 PM jiye <jiye_sw(at)126(dot)com> wrote:
>> Sorry, I made a mistake about the tcp_user_timeout configuration. Our app sets it to 9000 (9 seconds), but it still errors out even with 9000 - it just takes a little longer to error.
>> And about this point :
>> => I don’t actually know whether or if “buffer filling up” is accurate or relevant here. It doesn’t seem that way. You haven’t demonstrated that scenario here, just a timeout being reached.
>> Actually i have caputured tcp dump firstly, and "tcp buffer filling up" seem to be demonstrated by "TCP windows full" packet."
>> Secondly if data of fetch rows are not sufficiently wide, it does not reproduce this issue.
>>
>> So i suspect that the reason for this connection timeout is that the tcp buffer is full.
>
>I think this problem is not with async execution, but with your
>environment; if the root cause of it is “TCP windows full”, I think it
>might fix it to 1) retrieve only needed columns from the remote server
>and 2) decrease the fetch_size option for postgres_fdw.
>
>Best regards,
>Etsuro Fujita

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrei Lepikhov 2026-04-03 07:17:08 pg_plan_advice fails when NestLoop outer side is Sort over FunctionScan
Previous Message surya poondla 2026-04-02 23:14:00 Re: BUG #19382: Server crash at __nss_database_lookup