Re: [HACKERS] logical decoding of two-phase transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2021-03-07 04:00:24
Message-ID: CAA4eK1+dO07RrQwfHAK5jDP9qiXik4-MVzy+coEG09shWTJFGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 7, 2021 at 7:35 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Please find attached the latest patch set v51*
>

Few more comments on v51-0006-Fix-apply-worker-empty-prepare:
======================================================
1.
+/*
+ * A Prepare spoolfile hash entry. We create this entry in the
psf_hash. This is
+ * for maintaining a mapping between the name of the prepared
spoolfile, and the
+ * corresponding fileset handles of same.
+ */
+typedef struct PsfHashEntry
+{
+ char name[MAXPGPATH]; /* Hash key --- must be first */
+ bool allow_delete; /* ok to delete? */
+} PsfHashEntry;
+

IIUC, this has table is used for two purposes in the patch (a) to
check for existence of prepare spool file where we anyway to check it
on disk if not found in the hash table. (b) to allow the prepare spool
file to be removed on proc_exit.

I think we don't need the optimization provided by (a) because it will
be too rare a case to deserve any optimization, we might write a
comment in prepare_spoolfile_exists to indicate such an optimization.
For (b), we can use a simple list to track files to be removed on
proc_exit something like we do in CreateLockFile. I think avoiding
hash table usage will reduce the code and chances of bugs in this
area. It won't be easy to write a lot of automated tests to test this
functionality so it is better to avoid minor optimizations at this
stage.

2.
+ /*
+ * Replay/dispatch the spooled messages (including lastly, the PREPARE
+ * message).
+ */
+
+ ensure_transaction();

The part of the comment: "including lastly, the PREPARE message"
doesn't seem to fit here because in this part of the code you are not
doing anything special for Prepare message. Neither are we in someway
verifying that prepared message is replayed.

3.
+ /* create or find the prepare spoolfile entry in the psf_hash */
+ hentry = (PsfHashEntry *) hash_search(psf_hash,
+ path,
+ HASH_ENTER | HASH_FIND,
+ &found);
+
+ if (!found)
+ {
+ elog(DEBUG1, "Not found file \"%s\". Create it.", path);
+ psf_cur.vfd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY);
+ if (psf_cur.vfd < 0)
+ {
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not create file \"%s\": %m", path)));
+ }
+ memcpy(psf_cur.name, path, sizeof(psf_cur.name));
+ psf_cur.cur_offset = 0;
+ hentry->allow_delete = true;
+ }
+ else
+ {
+ /*
+ * Open the file and seek to the beginning because we always want to
+ * create/overwrite this file.
+ */
+ elog(DEBUG1, "Found file \"%s\". Overwrite it.", path);
+ psf_cur.vfd = PathNameOpenFile(path, O_RDWR | O_CREAT | O_TRUNC | PG_BINARY);
+ if (psf_cur.vfd < 0)
+ {
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open file \"%s\": %m", path)));
+ }
+ memcpy(psf_cur.name, path, sizeof(psf_cur.name));
+ psf_cur.cur_offset = 0;
+ hentry->allow_delete = true;
+ }

Is it sufficient to check if the prepared file exists in hash_table?
Isn't it possible that after restart even though the file exists but
it won't be there in the hash table? I guess this will change if you
address the first comment.

4.
@@ -754,9 +889,58 @@ apply_handle_prepare(StringInfo s)
{
LogicalRepPreparedTxnData prepare_data;

+ /*
+ * If we were using a psf spoolfile, then write the PREPARE as the final
+ * message. This prepare information will be used at commit_prepared time.
+ */
+ if (psf_cur.is_spooling)
+ {
+ PsfHashEntry *hentry;
+
+ /* Write the PREPARE info to the psf file. */
+ prepare_spoolfile_handler(LOGICAL_REP_MSG_PREPARE, s);
+
+ /*
+ * Flush the spoolfile, so changes can survive a restart.
+ */
+ FileSync(psf_cur.vfd, WAIT_EVENT_DATA_FILE_SYNC);

I think in an ideal world we only need to flush the spool file(s) when
the replication origin is advanced because at that stage after the
restart we won't get this data again. So, now, if the publisher sends
the data again after restart because the origin on the subscriber was
not moved past this prepare, you need to overwrite the existing file
which the patch is already doing but I think it is better to add some
comments explaining this.

5. Can you please test some subtransaction cases (by having savepoints
for the prepared transaction) which pass through the spool file logic?
Something like below with maybe more savepoints.
postgres=# begin;
BEGIN
postgres=*# insert into t1 values(1);
INSERT 0 1
postgres=*# savepoint s1;
SAVEPOINT
postgres=*# insert into t1 values(2);
INSERT 0 1
postgres=*# prepare transaction 'foo';
PREPARE TRANSACTION

I don't see any obvious problem in such cases but it is better to test.

6. Patch 0003 and 0006 can be merged to patch 0002 as that will enable
complete functionality for 0002. I understand that you have kept them
for easier review but I guess at this stage it is better to merge them
so that the complete functionality can be reviewed.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-03-07 04:12:43 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Zhihong Yu 2021-03-07 02:57:05 Re: Parallel INSERT (INTO ... SELECT ...)