Re: Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up

From: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
To: jiye <jiye_sw(at)126(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up
Date: 2026-04-12 07:05:25
Message-ID: CAPmGK17zmpCP6NAWWuQ9tPQpvLBRs6FDCMKxd37Chw_81ZnueQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Apr 3, 2026 at 12:13 PM jiye <jiye_sw(at)126(dot)com> wrote:
> We have successfully reproduced this issue and gained a clearer understanding of its root cause. The application uses a cursor to fetch partial results in batches, with a delay between consecutive fetch operations. When the interval between two batches exceeds the tcp_user_timeout threshold, the connection is terminated unexpectedly.

I think that that is *expected* behavior.

> In my analysis, during cursor-based queries, applications typically retrieve results in partial batches. If the number of rows fetched in a single batch is smaller than the number of rows scanned from the local table, the executor is unable to proceed with fetching rows from the foreign table.

IIRC, I don't think that Append in async mode has such a limitation;
it chooses the next partition to scan independently of the number of
rows returned from it. No?

> To achieve a fundamental resolution, I propose two potential solutions:
>
> ‌Alternate Row Fetching‌: Modify the executor to alternately retrieve rows from the local table and the foreign table, ensuring balanced data flow between the two data sources.
> ‌Asynchronous Tuple Storage‌: Implement a tuple storage mechanism to asynchronously cache results from the foreign table. This would allow the executor to fetch foreign table results into the storage buffer independently, preventing TCP window exhaustion and decoupling the dependency between local and foreign data retrieval.

I suppose that these are improvements, but I'm not sure these are
really worth complicating the code, as what you are trying to solve by
these is not a normal case; in particular, it's far from normal to set
a tcp_user_timeout that the query cannot finish.

Anyway, thanks for sharing the analysis and ideas! Sorry for the delay.

Best regards,
Etsuro Fujita

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2026-04-12 15:45:39 BUG #19454: PL/pgSQL mishandling jsonb attribute reference
Previous Message Tom Lane 2026-04-11 02:22:56 Re: BUG #19006: Assert(BufferIsPinned) in BufferGetBlockNumber() is triggered for forwarded buffer