Re: Bypassing cursors in postgres_fdw to enable parallel plans

From: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: KENAN YILMAZ <kenan(dot)yilmaz(at)localus(dot)com(dot)tr>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Bypassing cursors in postgres_fdw to enable parallel plans
Date: 2026-04-02 18:00:25
Message-ID: CA+FpmFeowaWiXUYAGxDsoXcDHQWL9me+RxPtc+hz92NU+fCYpg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 6 Mar 2026 at 21:51, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Jan 26, 2026 at 6:44 AM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
> wrote:
> > Thanks Robert for your time and attention to this patch.
> > Based on these review comments and an off list discussion about the
> design of the patch, I have reworked the patch significantly.
> > In this version, a tuplestore is added to the PgFdwScanState along with
> a flag. Now, in case of a cursor switch, this tuplestore is filled with the
> remaining tuples of the query. The corresponding flag is set to indicate
> that the tuplestore is ready to be fetched. To remember the last query that
> was executing, a pointer to its PgFdwScanState is maintained in the
> conn_state. Note that we only need to remember just the very last query and
> once tuples for it are fetched we can forget that pointer, because now its
> fsstate has the remaining tuples in the tuplestore.
> > So, this version of the patch looks simpler than before.
> > A few test cases are also added in the attached patch to cover the
> different scenarios in case of non-cursor mode.
>
> + DefineCustomBoolVariable("postgres_fdw.use_cursor",
>
> Usually I would expect the GUC for a few feature to be defined in such
> a way that a true values means to use the new feature and a false
> value means not to do that. This is reversed. Maybe consider renaming
> the GUC so that the sense is inverted.
>
> + /* To be used only in non-cursor mode */
> + Tuplestorestate *tuplestore; /* Tuplestore to save the tuples
> of the
> +
> * query for later fetc
> h. */
> + TupleTableSlot *slot; /* Slot to be used when
> reading the tuple from
> + * the
> tuplestore */
> + bool tuples_ready; /* To indicate when tuplestore
> is ready to be
> + * read. */
> + int total_tuples; /* total tuples in the
> tuplestore. */
>
> There's already a tuplestore_tuple_count() function, so it's not clear
> why you need total_tuples. It's also not exactly clear what
> tuples_ready means here: sure, it means that we're ready to read
> tuples, but when does that happen? I think the /* To be used only in
> non-cursor mode */ comment could use expansion, too. Maybe use this to
> explain the idea that we're going to use a Tuplestore in non-cursor
> mode when, and only when, there are overlapping scans.
>
> + fsstate->conn_state->last_query = NULL;
>
> I think this should be called something like active_scan. last_query
> makes it sound like the last query we ran, whether or not it is still
> in progress. But the value that it points to is an PgFdwScanState, so
> it's not a query, it's a scan, and we should only have this set to a
> non-NULL value for as long as that scan is actually active (i.e. has
> the exclusive use of the connection).
>
> + * This routine fetches all the tuples of a query and saves them in a
> tuplestore.
> + * This is required when the result of a query is not completely
> fetched but the control
> + * switches to a different query.
> + * A call to fetch_data is made from here, hence we need complete
> ForeignScanState here.
>
> This is a good explanation but the last sentence probably isn't
> needed. Also, you don't mention that this only used in non-cursor mode
> (or whatever we end up calling it).
>
> + if (conn->asyncStatus == PGASYNC_IDLE)
> + {
> + /* If the connection is not active then set up */
> + if (!PQsendQueryParams(conn, last_fsstate->query,
> last_fsstate->numParams,
> + NULL,
> values, NULL, NULL, 0))
> + pgfdw_report_error(NULL, conn,
> last_fsstate->query);
> +
> + if (!PQsetChunkedRowsMode(conn, last_fsstate->fetch_size))
> + pgfdw_report_error(NULL, conn,
> last_fsstate->query);
> + }
>
> I think this is a problem for multiple reasons. One problem is that
> the function header says that we use this "when the result of a query
> is not completely fetched". But if the status where PGASYNC_IDLE, the
> query has not even started yet. In that case, seems there is no reason
> to call this function in the first place. We can just start the new
> scan right away and wait to do anything about the other scan until
> it's needed. Additionally, asyncStatus is not part of libpq's exposed
> API. You're only able to access it here because you've included
> libpq-int.h into postgres_fdw.h, but libpq-int.h is so-called because
> it's "internal", so you should really be trying very hard not to use
> it, and especially not to include it in another header file. I suggest
> that there's probably just a bug here -- I don't think you should need
> to check asyncStatus here in the first place.
>
> + if (pgfdw_use_cursor)
> + snprintf(sql, sizeof(sql), "CLOSE c%u",
> + fsstate->cursor_number);
> + else
> + close_cursor = true;
>
> This is extremely confusing coding. If we're using a cursor, we write
> some SQL into a buffer but don't actually do anything with the buffer
> right away. But if we're not using a cursor, then we set a flag that
> tells us to close a cursor ... which we're not using? I think the
> logic updates in the function are not at all clear. You need to back
> up and think more carefully about what the function is supposed to do
> in each case. Look at how the function starts after your patch:
>
> postgresReScanForeignScan(ForeignScanState *node)
> {
> PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
> char sql[64];
> PGresult *res;
> bool close_cursor = false;
>
> /* If we haven't created the cursor yet, nothing to do. */
> if (!fsstate->cursor_exists)
> return;
>
> Well, if we're in non-cursor mode, that presumably means this function
> always returns without doing anything, since there should not be a
> cursor if we are not using with them. That can't be right.
> Alternatively, maybe it means that the cursor_exists flag is now being
> used to track something other than whether a cursor exists, which
> wouldn't be right either. I don't know off the top of my head exactly
> what this function should do in each case, but as the patch author,
> it's your job to go through, figure that out, and make sure that it is
> all correct and well-commented.
>
> Similarly, you have a bunch of changes to a function called
> create_cursor(), but why would a function that creates a cursor get
> changed to handle a mode where we don't create cursors? Either the
> function needs to be renamed, or there should be two separate
> functions, or these changes don't do anything in the first place.
>
> + while (res != NULL)
> + res = pgfdw_get_result(fsstate->conn);
>
> I don't see how this code can be right. We should know how many result
> sets we expect and read exactly that number, not just read over and
> over until there's nothing more. But even if we did want to read until
> there's nothing more, we would need to free those result sets rather
> than leaking them.
>
> -
> - /* Clean up */
> - pfree(buf.data);
>
> It's hard to understand why this would be a good idea.
>
> /*
> * We'll store the tuples in the batch_cxt. First, flush the previous
> * batch.
> */
> fsstate->tuples = NULL;
> - MemoryContextReset(fsstate->batch_cxt);
> oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);
>
> It's also hard to understand why this would be a good idea. It seems
> like it makes the comment false: we wouldn't be flushing the previous
> batch any more. We'd just be forgetting that it existed and leaking
> the memory.
>
> + /*
> + * In non-cursor mode, there are three options for the further
> + * processing: 1. If the tuplestore is already filled, retrieve the
> + * tuples from there. 2. Fetch the tuples till the end of the query
> + * and store them in tuplestore. 3. Perform a normal fetch and process
> + * the tuples.
>
> I find this comment pretty confusing. First of all, it's not really
> clear what performing a "normal fetch" means here. But also, this says
> that in non-cursor mode there are three options, and that's followed
> by an if-else block i.e. two options. Three isn't the same as two, so
> something isn't right here.
>
> The code that follows is quite involved. It's rather deeply nested in
> places, so possibly some of it needs to be broken out into separate
> functions. It contains another instance, of this, which has the same
> problems I pointed out earlier:
>
> + /* There are no more tuples to fetch */
> + while (res != NULL)
> + res = pgfdw_get_result(conn);
>
> If there are no more tuples to fetch, then why do we need to loop
> fetching result sets until we get a NULL? And if we ever get any, we
> leak them, which is also bad.
>
> + /*
> + * Reset cursor mode when in asynchronous mode as it is not supported in
> + * non-cursor mode
> + */
> + if (!pgfdw_use_cursor)
> + pgfdw_use_cursor = true;
>
> It's not allowed to change a GUC variable from outside of the GUC
> code. If you need to disable the optimization in certain cases, you
> need a separate flag for that, which is initially set based on
> pgfdw_use_cursor and then can be changed here or wherever is needed.
> You can't change the GUC value itself. This would cause a variety of
> problems, including the fact that the optimization would then stay
> disabled for later queries in the session, but probably also some more
> GUC-related stuff.
>
> -
> SELECT current_database() AS current_database,
> current_setting('port') AS current_port
> \gset
> -
>
> Please avoid including unnecessary cosmetic changes in the patch file.
>
> Overall, I think the test case changes are in much better shape than
> they were in earlier patch versions. I suspect there might still be a
> bit of room for improvement, but we can deal with that once some of
> these other issues are sorted out.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>

Thank you for the detailed review of the patch. Please find the attached
file for the updated patch.
To summarise the changes done in this patch following these review comments,
- the new GUC is called disable_cursor, so when we want to use this feature
we set it, otherwise it remains false by default. I hope this makes it
easier to understand as you mentioned.

- Simplified the code in fetch_more_data for this new mode. Now we have a
function called fetch_from_tuplestore, and it is called from
fetch_more_data when tuples from tuplestore are requested. Another function
called save_to_tuplestore is also introduced in this patch, it is called
from create_cursor code when all the tuples are to be fetched for the
active_scan. So things look better code readability wise.

- For handling the async mode, I went to one level up and modified
async_capable to false when no_cursor mode is used. So, now we do not
change any GUC as such.

- With regards to use of flag cursor_exists and functions create_cursor and
close_cursor, these are used in this no_cursor mode as well. For the flag
cursor_exists, I used it to check if we have already done the required
steps for when disale_cursor is set, like setting the ChunkedRowsMode,
saving the tuples to the tuplestore if required. Inherently there is no
nothing in fsstate or conn_state through which we can identify that all the
required steps are done or if this scan is coming for the first time, so
maybe we can have a different flag for the case when disable_cursor is set
and use it for such checks. But to me it feels like additional code when I
can use existing vars, but I agree on making it more readable that way. So,
I modified in this patch to add a separate flag called scan_init and
functions init_scan and end_scan for the setting and resetting of the
required values when cursors are not used. So, let me know what you think
is better using the existing flag or creating a new one for this mode as
done in this patch.

- For this change,
/*
* We'll store the tuples in the batch_cxt. First, flush the previous
* batch.
*/
fsstate->tuples = NULL;
- MemoryContextReset(fsstate->batch_cxt);
oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);
since, the tuplestore is now created in batch_cxt, so we can't reset it
here. But I now changed it to be done only when disable_cursor is set.

- Regarding the following change,
+ if (conn->asyncStatus == PGASYNC_IDLE)
+ {
+ /* If the connection is not active then set up */
When we are here after the call to the rescan, the connection is as per the
current fsstate which has not yet started, hence the idle status, but we
are here to fetch the tuples for the active scan which got reset after the
rescan so it is required to set it up again. But you are right in that
using asyncStatus is not appropriate here, so I added a new flag to check
for rescan, which is set in rescan and used here to decide if we need to
set the ChunkedRowsMode.

- Modified and added more comments
--
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH

Attachment Content-Type Size
v7-Fetch-without-cursors.patch application/octet-stream 55.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KAZAR Ayoub 2026-04-02 18:07:38 Re: Speed up COPY TO text/CSV parsing using SIMD
Previous Message Tom Lane 2026-04-02 17:43:36 Re: pg_waldump: support decoding of WAL inside tarfile