Re: Bypassing cursors in postgres_fdw to enable parallel plans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
Cc: KENAN YILMAZ <kenan(dot)yilmaz(at)localus(dot)com(dot)tr>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bypassing cursors in postgres_fdw to enable parallel plans
Date: 2026-04-13 19:30:36
Message-ID: CA+TgmoZ-MZpDWHNZYLv9KGHPWb6RN5TZ8F0Xu2qk+Wa5UszUOQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 10, 2026 at 3:42 AM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> wrote:
> Rebased version of the patch is attached.
> It is a minor change from the earlier version.

Hi,

Some issues:

- fetch_more_data() still gains a use_tuplestore parameter, but now
it's always false and never used for anything. already_done is also
dead code; it's never set to anything but false. There are also other
places in the patch where you haven't cleaned up things that aren't
needed any more.

- I think that instead of add scan_init alongside cursor_exists, it
might be better to just rename the existing cursor_exists flag and use
it in both modes. Right now, you've got a bunch of places in the code
where when disable_cursor has one value you look at cursor_exists and
when it has the other value you look at scan_init; basically, the two
flags seem to be the same thing, and only the name of the flag doesn't
make sense after your patch. But I would suggest calling it
scan_in_progress rather than scan_init, because it gets set back to
false in postgresEndForeignScan() when the scan finishes. In fact, I
would suggest creating a preparatory patch that does the cursor_exists
-> scan_in_progress renaming and make that 0001, and then make your
main patch 0002 on top of that. That patch could possibly be committed
separately at some point.

- postgresBeginForeignScan incorrectly does this:

/* Initially, there is no active_scan */
fsstate->conn_state->active_scan = NULL;

I think this is compensating for the lack of anything to clear
active_scan in postgresEndForeignScan; if you didn't do this, then
probably the first query using non-cursor mode would work and the
second one would fail. But the right fix is to remove this and make
postgresEndForeignScan clean it up properly instead. Otherwise,
imagine that query A begins using a postgres_fdw connection and is in
the middle of streaming a result set over the wire when a call to a
user-defined function triggers query B which accesses the same
postgres_fdw connection. This logic now nulls out the active-scan
pointer from the outer connection while in reality that outer
connection is still active. I expect terrible things to happen after
that. You might want to try to create a test case that exercises this
scenario.

- The logic in postgresReScanForeignScan() has multiple flaws. One of
these is, as you probably know already, where it says /* TODO: Handle
it in non-cursor mode as well */. Let's think through how this needs
to work a little bit. When we're using a cursor, we have three
possible strategies: we can close the cursor and re-execute the query,
we can use MOVE BACKWARD ALL, or if we still have every tuple we've
ever fetched from the remote side, we can just rescan them (the "Easy"
case). When there's no cursor, only two strategies are possible --
there's no equivalent of MOVE BACKWARD ALL. We can either throw away
the tuplestore we've accumulated and any results that are still being
received from the remote side (which is similar to "CLOSE") or we just
rewind our internal scan position (the "Easy" case).

So what I would suggest here is: set a local variable
reinitialize_scan = false at the top. In the cases where we currently
print CLOSE c%u, just set reinitialize_scan = true. In the MOVE
BACKWARD ALL case (fetch_ct_2 > 1 and PQserverVersion(fsstate->conn) <
150000 and !disable_cursor), construct the query, execute it, and
return from the function right there. Otherwise, fall through to the
end of the function and deal with what reinitializing the scan means
in the cursor case (CLOSE %u) and in the non-cursor case. But here we
come to a second bug: right now, you try to reinitialize the scan in
the disable_cursor case by just calling pgfdw_get_result(), but you do
so without checking whether the scan that we're rescanning is the same
as the active scan. If it is, then you need to do this, but if it
isn't, then you must not. Either way, you might also need to discard
any tuplestore that has been collected.

There's also the case where you don't reinitialize the scan, which is
the easy case: whether cursors are in use or not, you just need to
reset your notion of what's been fetched. But here we come to another
problem: fetch_ct_2 is, I would argue, not maintained properly in
non-cursor mode. In cursor mode, 0 means we haven't yet read any
tuples, 1 means we've read some tuples but not yet thrown any away,
and 2 means that some have been discarded. If it had the same meaning
in non-cursor mode, then the rescan code could use that value to
decide whether to take the "Easy" path where it just resets
next_tuple. But it doesn't: fetch_ct_2 in non-cursor mode just counts
the number of calls to fetch_more_data(), which doesn't tell us
anything about what rescan should do.

- The logic in postgresEndForeignScan has one of the same problems
that I mentioned in the previous point: it will call
pgfdw_get_result() even when the scan being ended is not the active
scan, and it doesn't deal with the associated tuplestore.

- save_to_tuplestore() gets confused between the scan FROM which it's
being called and the scan FOR which it's being called, or said
differently, it confuses the currently-active scan with the new scan
that we want to initiate. For instance, it tests fsscan->rescan to
decide whether to resend active_fsstate->query. That doesn't really
make any sense. I think a general problem with this patch is that it's
not clear exactly what the rescan flag is actually intended to mean,
but it certainly can't be right to decide that the query for plan node
A needs to be re-sent because plan node B is being rescanned. A little
further down, you switch to fsstate->batch_cxt to store tuples in
active_fsstate->tuplestore, which is another example of the same
confusion. You should probably have a function here that only gets the
*active* scan as an argument, so that you can't get mixed up like this
and mistakenly refer to the wrong one. We don't even know that fsstate
and active_fsstate are from the same query: recall the case I
mentioned earlier, where query A begins using a postgres_fdw
connection and is in the middle of streaming a result set over the
wire when a call to a user-defined function triggers query B which
accesses the same postgres_fdw connection.

- There are multiple points in save_to_tuplestore() where you can
overwrite res without calling PQclear() on the previous value of res.
You need to make sure that every place in the patch where you create a
PGresult, you also free it.

- You need to think about the fact that the GUC can be changed at any
time. disable_cursor could be true when the scan starts and false by
the time it ends, or false by the time it's rescanned, or it could be
true for an outer query that starts running a query on the connection
and then false by the time an inner level wants to use the same
connection for something else. Right now, if anything like that
happens, this code is going to fail terribly. I think you need every
scan to remember whether it was started with or without a cursor, and
do rescans and end-scans based on that, not the GUC. And even if the
*current* scan is being started with a cursor, that doesn't mean there
can't be a *previous* scan in progress that was started without using
a cursor; you need to make sure, for example, that
save_to_tuplestore() is still called in that case.

- I know I suggested inverting the sense of the GUC (the feature
should say what it does, not what it doesn't do) but I don't really
like disable_cursor as the name of the feature, either, because
defining the feature in terms of shutting off cursors is kind of
confusing. I think we should come up with a name for this feature that
doesn't have a negation in the name. Maybe something like
postgres_fdw.streaming_fetch = { true | false }? Or
postgres_fdw.fetch_mode = { cursor | stream }? That would be similar
to the existing fetch_size option; one could even argue for allowing
fetch_size = 0 to mean "don't use a cursor, just fetch everything,"
although maybe overloading the setting that way is a little too clever
(or maybe it's fine).

- But also note that postgres_fdw.fetch_size -- which, again, is
closely related to this feature -- is a table or server level option,
not a GUC. That suggests that this feature should also be a table or
server level option.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message SATYANARAYANA NARLAPURAM 2026-04-13 19:41:31 Re: pg_get__*_ddl consolidation
Previous Message Michael Paquier 2026-04-13 19:01:53 Re: test_compression, test module for low-level compression APIs (for 2b5ba2a0a141)