Re: Bypassing cursors in postgres_fdw to enable parallel plans

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: KENAN YILMAZ <kenan(dot)yilmaz(at)localus(dot)com(dot)tr>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bypassing cursors in postgres_fdw to enable parallel plans
Date: 2026-05-11 11:03:09
Message-ID: CA+FpmFdZaWRc38R7M6KrEpFOprU=hmk2CYJUH4H6UiKWKRAgCA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 13 Apr 2026 at 21:30, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Apr 10, 2026 at 3:42 AM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
> wrote:
> > Rebased version of the patch is attached.
> > It is a minor change from the earlier version.
>
> Hi,
>
> Some issues:
>
> - fetch_more_data() still gains a use_tuplestore parameter, but now
> it's always false and never used for anything. already_done is also
> dead code; it's never set to anything but false. There are also other
> places in the patch where you haven't cleaned up things that aren't
> needed any more.
>
> - I think that instead of add scan_init alongside cursor_exists, it
> might be better to just rename the existing cursor_exists flag and use
> it in both modes. Right now, you've got a bunch of places in the code
> where when disable_cursor has one value you look at cursor_exists and
> when it has the other value you look at scan_init; basically, the two
> flags seem to be the same thing, and only the name of the flag doesn't
> make sense after your patch. But I would suggest calling it
> scan_in_progress rather than scan_init, because it gets set back to
> false in postgresEndForeignScan() when the scan finishes. In fact, I
> would suggest creating a preparatory patch that does the cursor_exists
> -> scan_in_progress renaming and make that 0001, and then make your
> main patch 0002 on top of that. That patch could possibly be committed
> separately at some point.
>
> - postgresBeginForeignScan incorrectly does this:
>
> /* Initially, there is no active_scan */
> fsstate->conn_state->active_scan = NULL;
>
> I think this is compensating for the lack of anything to clear
> active_scan in postgresEndForeignScan; if you didn't do this, then
> probably the first query using non-cursor mode would work and the
> second one would fail. But the right fix is to remove this and make
> postgresEndForeignScan clean it up properly instead. Otherwise,
> imagine that query A begins using a postgres_fdw connection and is in
> the middle of streaming a result set over the wire when a call to a
> user-defined function triggers query B which accesses the same
> postgres_fdw connection. This logic now nulls out the active-scan
> pointer from the outer connection while in reality that outer
> connection is still active. I expect terrible things to happen after
> that. You might want to try to create a test case that exercises this
> scenario.
>
> - The logic in postgresReScanForeignScan() has multiple flaws. One of
> these is, as you probably know already, where it says /* TODO: Handle
> it in non-cursor mode as well */. Let's think through how this needs
> to work a little bit. When we're using a cursor, we have three
> possible strategies: we can close the cursor and re-execute the query,
> we can use MOVE BACKWARD ALL, or if we still have every tuple we've
> ever fetched from the remote side, we can just rescan them (the "Easy"
> case). When there's no cursor, only two strategies are possible --
> there's no equivalent of MOVE BACKWARD ALL. We can either throw away
> the tuplestore we've accumulated and any results that are still being
> received from the remote side (which is similar to "CLOSE") or we just
> rewind our internal scan position (the "Easy" case).
>
> So what I would suggest here is: set a local variable
> reinitialize_scan = false at the top. In the cases where we currently
> print CLOSE c%u, just set reinitialize_scan = true. In the MOVE
> BACKWARD ALL case (fetch_ct_2 > 1 and PQserverVersion(fsstate->conn) <
> 150000 and !disable_cursor), construct the query, execute it, and
> return from the function right there. Otherwise, fall through to the
> end of the function and deal with what reinitializing the scan means
> in the cursor case (CLOSE %u) and in the non-cursor case. But here we
> come to a second bug: right now, you try to reinitialize the scan in
> the disable_cursor case by just calling pgfdw_get_result(), but you do
> so without checking whether the scan that we're rescanning is the same
> as the active scan. If it is, then you need to do this, but if it
> isn't, then you must not. Either way, you might also need to discard
> any tuplestore that has been collected.
>
> There's also the case where you don't reinitialize the scan, which is
> the easy case: whether cursors are in use or not, you just need to
> reset your notion of what's been fetched. But here we come to another
> problem: fetch_ct_2 is, I would argue, not maintained properly in
> non-cursor mode. In cursor mode, 0 means we haven't yet read any
> tuples, 1 means we've read some tuples but not yet thrown any away,
> and 2 means that some have been discarded. If it had the same meaning
> in non-cursor mode, then the rescan code could use that value to
> decide whether to take the "Easy" path where it just resets
> next_tuple. But it doesn't: fetch_ct_2 in non-cursor mode just counts
> the number of calls to fetch_more_data(), which doesn't tell us
> anything about what rescan should do.
>
> - The logic in postgresEndForeignScan has one of the same problems
> that I mentioned in the previous point: it will call
> pgfdw_get_result() even when the scan being ended is not the active
> scan, and it doesn't deal with the associated tuplestore.
>
> - save_to_tuplestore() gets confused between the scan FROM which it's
> being called and the scan FOR which it's being called, or said
> differently, it confuses the currently-active scan with the new scan
> that we want to initiate. For instance, it tests fsscan->rescan to
> decide whether to resend active_fsstate->query. That doesn't really
> make any sense. I think a general problem with this patch is that it's
> not clear exactly what the rescan flag is actually intended to mean,
> but it certainly can't be right to decide that the query for plan node
> A needs to be re-sent because plan node B is being rescanned. A little
> further down, you switch to fsstate->batch_cxt to store tuples in
> active_fsstate->tuplestore, which is another example of the same
> confusion. You should probably have a function here that only gets the
> *active* scan as an argument, so that you can't get mixed up like this
> and mistakenly refer to the wrong one. We don't even know that fsstate
> and active_fsstate are from the same query: recall the case I
> mentioned earlier, where query A begins using a postgres_fdw
> connection and is in the middle of streaming a result set over the
> wire when a call to a user-defined function triggers query B which
> accesses the same postgres_fdw connection.
>
> - There are multiple points in save_to_tuplestore() where you can
> overwrite res without calling PQclear() on the previous value of res.
> You need to make sure that every place in the patch where you create a
> PGresult, you also free it.
>
> - You need to think about the fact that the GUC can be changed at any
> time. disable_cursor could be true when the scan starts and false by
> the time it ends, or false by the time it's rescanned, or it could be
> true for an outer query that starts running a query on the connection
> and then false by the time an inner level wants to use the same
> connection for something else. Right now, if anything like that
> happens, this code is going to fail terribly. I think you need every
> scan to remember whether it was started with or without a cursor, and
> do rescans and end-scans based on that, not the GUC. And even if the
> *current* scan is being started with a cursor, that doesn't mean there
> can't be a *previous* scan in progress that was started without using
> a cursor; you need to make sure, for example, that
> save_to_tuplestore() is still called in that case.
>
> - I know I suggested inverting the sense of the GUC (the feature
> should say what it does, not what it doesn't do) but I don't really
> like disable_cursor as the name of the feature, either, because
> defining the feature in terms of shutting off cursors is kind of
> confusing. I think we should come up with a name for this feature that
> doesn't have a negation in the name. Maybe something like
> postgres_fdw.streaming_fetch = { true | false }? Or
> postgres_fdw.fetch_mode = { cursor | stream }? That would be similar
> to the existing fetch_size option; one could even argue for allowing
> fetch_size = 0 to mean "don't use a cursor, just fetch everything,"
> although maybe overloading the setting that way is a little too clever
> (or maybe it's fine).
>
> - But also note that postgres_fdw.fetch_size -- which, again, is
> closely related to this feature -- is a table or server level option,
> not a GUC. That suggests that this feature should also be a table or
> server level option.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

Hello there fellow hackers,

Thanks Robert for the detailed review and some great suggestions for
improving this patch. I went through the comments and the patch and here is
the revised version. To summarize the changes done in this patch,
- as suggested, split it into two patches now, the first patch changes the
cursor_exists flag to scan_in_progress, the second patch is the one with
the rest of the changes to implement this fetch mechanism
- changed the GUC to server/tables level option called streaming_fetch.
Still a boolean type. Open to better naming suggestions.
- refactored postrgesReScanForeignScan for a more readable code
Following the suggestions above, now, it handles the case for the
backward cursor separately and exits the function straight from there. In
case of close cursor, it checks if cursor or cursor-free mode is used, and
executes different actions as per the mode. In cursor-free mode we end_scan
and tuplestore. In the third case when we start scanning the tuples already
fetched, things remain the same in both modes.
- added the changes to show the option in explain verbose output
- Added a lot more test cases to check the option with other options and
also to cover more code-paths, particularly rescans and ending the query in
between, error cases etc.

There is one problem that remains here, the use of ExprContext. We need
this when we are fetching the tuples for the active scan, but since this is
only available in node, we can not have the one which was there at the time
when active_scan was the fsstate, so at the moment it is using the econtext
from the node. This doesn't seem correct to me, but to have the correct
ExprContext we need the node, but it also doesn't seem totally right to
have the pointer to node in ScanState for this scenario. Please let me know
what could be a good way to handle this.
--
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH

Attachment Content-Type Size
v9-0001-Rename-cursor-exists-flag.patch application/octet-stream 3.9 KB
v9-0002-Fetch-without-cursors.patch application/octet-stream 122.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message solaimurugan vellaipandiyan 2026-05-11 11:04:20 Re: Review - Patch for pg_bsd_indent: improve formatting of multiline comments
Previous Message ChenhuiMo 2026-05-11 10:55:27 Re: [PATCH v2] Make NumericVar storage semantics explicit