| From: | Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | KENAN YILMAZ <kenan(dot)yilmaz(at)localus(dot)com(dot)tr>, Andy Fan <zhihuifan1213(at)163(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Bypassing cursors in postgres_fdw to enable parallel plans |
| Date: | 2026-04-10 07:41:48 |
| Message-ID: | CA+FpmFckg3xytZpMbz1d9Or4FyDE++5Tc6Qqo1Sgk-8F3RztsA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Rebased version of the patch is attached.
It is a minor change from the earlier version.
On Thu, 2 Apr 2026 at 20:00, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> wrote:
>
>
> On Fri, 6 Mar 2026 at 21:51, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> On Mon, Jan 26, 2026 at 6:44 AM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
>> wrote:
>> > Thanks Robert for your time and attention to this patch.
>> > Based on these review comments and an off list discussion about the
>> design of the patch, I have reworked the patch significantly.
>> > In this version, a tuplestore is added to the PgFdwScanState along with
>> a flag. Now, in case of a cursor switch, this tuplestore is filled with the
>> remaining tuples of the query. The corresponding flag is set to indicate
>> that the tuplestore is ready to be fetched. To remember the last query that
>> was executing, a pointer to its PgFdwScanState is maintained in the
>> conn_state. Note that we only need to remember just the very last query and
>> once tuples for it are fetched we can forget that pointer, because now its
>> fsstate has the remaining tuples in the tuplestore.
>> > So, this version of the patch looks simpler than before.
>> > A few test cases are also added in the attached patch to cover the
>> different scenarios in case of non-cursor mode.
>>
>> + DefineCustomBoolVariable("postgres_fdw.use_cursor",
>>
>> Usually I would expect the GUC for a few feature to be defined in such
>> a way that a true values means to use the new feature and a false
>> value means not to do that. This is reversed. Maybe consider renaming
>> the GUC so that the sense is inverted.
>>
>> + /* To be used only in non-cursor mode */
>> + Tuplestorestate *tuplestore; /* Tuplestore to save the tuples
>> of the
>> +
>> * query for later fetc
>> h. */
>> + TupleTableSlot *slot; /* Slot to be used when
>> reading the tuple from
>> + * the
>> tuplestore */
>> + bool tuples_ready; /* To indicate when tuplestore
>> is ready to be
>> + * read.
>> */
>> + int total_tuples; /* total tuples in the
>> tuplestore. */
>>
>> There's already a tuplestore_tuple_count() function, so it's not clear
>> why you need total_tuples. It's also not exactly clear what
>> tuples_ready means here: sure, it means that we're ready to read
>> tuples, but when does that happen? I think the /* To be used only in
>> non-cursor mode */ comment could use expansion, too. Maybe use this to
>> explain the idea that we're going to use a Tuplestore in non-cursor
>> mode when, and only when, there are overlapping scans.
>>
>> + fsstate->conn_state->last_query = NULL;
>>
>> I think this should be called something like active_scan. last_query
>> makes it sound like the last query we ran, whether or not it is still
>> in progress. But the value that it points to is an PgFdwScanState, so
>> it's not a query, it's a scan, and we should only have this set to a
>> non-NULL value for as long as that scan is actually active (i.e. has
>> the exclusive use of the connection).
>>
>> + * This routine fetches all the tuples of a query and saves them in a
>> tuplestore.
>> + * This is required when the result of a query is not completely
>> fetched but the control
>> + * switches to a different query.
>> + * A call to fetch_data is made from here, hence we need complete
>> ForeignScanState here.
>>
>> This is a good explanation but the last sentence probably isn't
>> needed. Also, you don't mention that this only used in non-cursor mode
>> (or whatever we end up calling it).
>>
>> + if (conn->asyncStatus == PGASYNC_IDLE)
>> + {
>> + /* If the connection is not active then set up */
>> + if (!PQsendQueryParams(conn, last_fsstate->query,
>> last_fsstate->numParams,
>> + NULL,
>> values, NULL, NULL, 0))
>> + pgfdw_report_error(NULL, conn,
>> last_fsstate->query);
>> +
>> + if (!PQsetChunkedRowsMode(conn, last_fsstate->fetch_size))
>> + pgfdw_report_error(NULL, conn,
>> last_fsstate->query);
>> + }
>>
>> I think this is a problem for multiple reasons. One problem is that
>> the function header says that we use this "when the result of a query
>> is not completely fetched". But if the status where PGASYNC_IDLE, the
>> query has not even started yet. In that case, seems there is no reason
>> to call this function in the first place. We can just start the new
>> scan right away and wait to do anything about the other scan until
>> it's needed. Additionally, asyncStatus is not part of libpq's exposed
>> API. You're only able to access it here because you've included
>> libpq-int.h into postgres_fdw.h, but libpq-int.h is so-called because
>> it's "internal", so you should really be trying very hard not to use
>> it, and especially not to include it in another header file. I suggest
>> that there's probably just a bug here -- I don't think you should need
>> to check asyncStatus here in the first place.
>>
>> + if (pgfdw_use_cursor)
>> + snprintf(sql, sizeof(sql), "CLOSE c%u",
>> + fsstate->cursor_number);
>> + else
>> + close_cursor = true;
>>
>> This is extremely confusing coding. If we're using a cursor, we write
>> some SQL into a buffer but don't actually do anything with the buffer
>> right away. But if we're not using a cursor, then we set a flag that
>> tells us to close a cursor ... which we're not using? I think the
>> logic updates in the function are not at all clear. You need to back
>> up and think more carefully about what the function is supposed to do
>> in each case. Look at how the function starts after your patch:
>>
>> postgresReScanForeignScan(ForeignScanState *node)
>> {
>> PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
>> char sql[64];
>> PGresult *res;
>> bool close_cursor = false;
>>
>> /* If we haven't created the cursor yet, nothing to do. */
>> if (!fsstate->cursor_exists)
>> return;
>>
>> Well, if we're in non-cursor mode, that presumably means this function
>> always returns without doing anything, since there should not be a
>> cursor if we are not using with them. That can't be right.
>> Alternatively, maybe it means that the cursor_exists flag is now being
>> used to track something other than whether a cursor exists, which
>> wouldn't be right either. I don't know off the top of my head exactly
>> what this function should do in each case, but as the patch author,
>> it's your job to go through, figure that out, and make sure that it is
>> all correct and well-commented.
>>
>> Similarly, you have a bunch of changes to a function called
>> create_cursor(), but why would a function that creates a cursor get
>> changed to handle a mode where we don't create cursors? Either the
>> function needs to be renamed, or there should be two separate
>> functions, or these changes don't do anything in the first place.
>>
>> + while (res != NULL)
>> + res = pgfdw_get_result(fsstate->conn);
>>
>> I don't see how this code can be right. We should know how many result
>> sets we expect and read exactly that number, not just read over and
>> over until there's nothing more. But even if we did want to read until
>> there's nothing more, we would need to free those result sets rather
>> than leaking them.
>>
>> -
>> - /* Clean up */
>> - pfree(buf.data);
>>
>> It's hard to understand why this would be a good idea.
>>
>> /*
>> * We'll store the tuples in the batch_cxt. First, flush the previous
>> * batch.
>> */
>> fsstate->tuples = NULL;
>> - MemoryContextReset(fsstate->batch_cxt);
>> oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);
>>
>> It's also hard to understand why this would be a good idea. It seems
>> like it makes the comment false: we wouldn't be flushing the previous
>> batch any more. We'd just be forgetting that it existed and leaking
>> the memory.
>>
>> + /*
>> + * In non-cursor mode, there are three options for the further
>> + * processing: 1. If the tuplestore is already filled, retrieve the
>> + * tuples from there. 2. Fetch the tuples till the end of the query
>> + * and store them in tuplestore. 3. Perform a normal fetch and process
>> + * the tuples.
>>
>> I find this comment pretty confusing. First of all, it's not really
>> clear what performing a "normal fetch" means here. But also, this says
>> that in non-cursor mode there are three options, and that's followed
>> by an if-else block i.e. two options. Three isn't the same as two, so
>> something isn't right here.
>>
>> The code that follows is quite involved. It's rather deeply nested in
>> places, so possibly some of it needs to be broken out into separate
>> functions. It contains another instance, of this, which has the same
>> problems I pointed out earlier:
>>
>> + /* There are no more tuples to fetch */
>> + while (res != NULL)
>> + res = pgfdw_get_result(conn);
>>
>> If there are no more tuples to fetch, then why do we need to loop
>> fetching result sets until we get a NULL? And if we ever get any, we
>> leak them, which is also bad.
>>
>> + /*
>> + * Reset cursor mode when in asynchronous mode as it is not supported in
>> + * non-cursor mode
>> + */
>> + if (!pgfdw_use_cursor)
>> + pgfdw_use_cursor = true;
>>
>> It's not allowed to change a GUC variable from outside of the GUC
>> code. If you need to disable the optimization in certain cases, you
>> need a separate flag for that, which is initially set based on
>> pgfdw_use_cursor and then can be changed here or wherever is needed.
>> You can't change the GUC value itself. This would cause a variety of
>> problems, including the fact that the optimization would then stay
>> disabled for later queries in the session, but probably also some more
>> GUC-related stuff.
>>
>> -
>> SELECT current_database() AS current_database,
>> current_setting('port') AS current_port
>> \gset
>> -
>>
>> Please avoid including unnecessary cosmetic changes in the patch file.
>>
>> Overall, I think the test case changes are in much better shape than
>> they were in earlier patch versions. I suspect there might still be a
>> bit of room for improvement, but we can deal with that once some of
>> these other issues are sorted out.
>>
>> --
>> Robert Haas
>> EDB: http://www.enterprisedb.com
>>
>
> Thank you for the detailed review of the patch. Please find the attached
> file for the updated patch.
> To summarise the changes done in this patch following these review
> comments,
> - the new GUC is called disable_cursor, so when we want to use this
> feature we set it, otherwise it remains false by default. I hope this makes
> it easier to understand as you mentioned.
>
> - Simplified the code in fetch_more_data for this new mode. Now we have a
> function called fetch_from_tuplestore, and it is called from
> fetch_more_data when tuples from tuplestore are requested. Another function
> called save_to_tuplestore is also introduced in this patch, it is called
> from create_cursor code when all the tuples are to be fetched for the
> active_scan. So things look better code readability wise.
>
> - For handling the async mode, I went to one level up and modified
> async_capable to false when no_cursor mode is used. So, now we do not
> change any GUC as such.
>
> - With regards to use of flag cursor_exists and functions create_cursor
> and close_cursor, these are used in this no_cursor mode as well. For the
> flag cursor_exists, I used it to check if we have already done the required
> steps for when disale_cursor is set, like setting the ChunkedRowsMode,
> saving the tuples to the tuplestore if required. Inherently there is no
> nothing in fsstate or conn_state through which we can identify that all the
> required steps are done or if this scan is coming for the first time, so
> maybe we can have a different flag for the case when disable_cursor is set
> and use it for such checks. But to me it feels like additional code when I
> can use existing vars, but I agree on making it more readable that way. So,
> I modified in this patch to add a separate flag called scan_init and
> functions init_scan and end_scan for the setting and resetting of the
> required values when cursors are not used. So, let me know what you think
> is better using the existing flag or creating a new one for this mode as
> done in this patch.
>
> - For this change,
> /*
> * We'll store the tuples in the batch_cxt. First, flush the previous
> * batch.
> */
> fsstate->tuples = NULL;
> - MemoryContextReset(fsstate->batch_cxt);
> oldcontext = MemoryContextSwitchTo(fsstate->batch_cxt);
> since, the tuplestore is now created in batch_cxt, so we can't reset it
> here. But I now changed it to be done only when disable_cursor is set.
>
> - Regarding the following change,
> + if (conn->asyncStatus == PGASYNC_IDLE)
> + {
> + /* If the connection is not active then set up */
> When we are here after the call to the rescan, the connection is as per
> the current fsstate which has not yet started, hence the idle status, but
> we are here to fetch the tuples for the active scan which got reset after
> the rescan so it is required to set it up again. But you are right in that
> using asyncStatus is not appropriate here, so I added a new flag to check
> for rescan, which is set in rescan and used here to decide if we need to
> set the ChunkedRowsMode.
>
> - Modified and added more comments
> --
> Regards,
> Rafia Sabih
> CYBERTEC PostgreSQL International GmbH
>
--
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH
| Attachment | Content-Type | Size |
|---|---|---|
| v8-Fetch-without-cursors.patch | application/octet-stream | 54.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Dapeng Wang | 2026-04-10 07:42:02 | Re: Docs: Create table description for constraints markup fix and label tweaks |
| Previous Message | Chao Li | 2026-04-10 07:40:31 | Re: Add errdetail() with PID and UID about source of termination signal |