Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-06-25 13:40:45
Message-ID: CAFiTN-uCVcUKJRBM5-F+KxTPFeEfaGk6h8x19CgytZzPozrb=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 24, 2020 at 4:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jun 23, 2020 at 7:00 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > Here is the POC patch to discuss the idea of a cleanup of shared
> > fileset on proc exit. As discussed offlist, here I am maintaining
> > the list of shared fileset. First time when the list is NULL I am
> > registering the cleanup function with on_proc_exit routine. After
> > that for subsequent fileset, I am just appending it to filesetlist.
> > There is also an interface to unregister the shared file set from the
> > cleanup list and that is done by the caller whenever we are deleting
> > the shared fileset manually. While explaining it here, I think there
> > could be one issue if we delete all the element from the list will
> > become NULL and on next SharedFileSetInit we will again register the
> > function. Maybe that is not a problem but we can avoid registering
> > multiple times by using some flag in the file
> >
>
> I don't understand what you mean by "using some flag in the file".
>
> Review comments on various patches.
>
> poc_shared_fileset_cleanup_on_procexit
> =================================
> 1.
> - ent->subxact_fileset =
> - MemoryContextAlloc(ApplyContext, sizeof(SharedFileSet));
> + MemoryContext oldctx;
>
> + /* Shared fileset handle must be allocated in the persistent context */
> + oldctx = MemoryContextSwitchTo(ApplyContext);
> + ent->subxact_fileset = palloc(sizeof(SharedFileSet));
> SharedFileSetInit(ent->subxact_fileset, NULL);
> + MemoryContextSwitchTo(oldctx);
> fd = BufFileCreateShared(ent->subxact_fileset, path);
>
> Why is this change required for this patch and why we only cover
> SharedFileSetInit in the Apply context and not BufFileCreateShared?
> The comment is also not very clear on this point.

Added the comments for the same.

> 2.
> +void
> +SharedFileSetUnregister(SharedFileSet *input_fileset)
> +{
> + bool found = false;
> + ListCell *l;
> +
> + Assert(filesetlist != NULL);
> +
> + /* Loop over all the pending shared fileset entry */
> + foreach (l, filesetlist)
> + {
> + SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
> +
> + /* remove the entry from the list and delete the underlying files */
> + if (input_fileset->number == fileset->number)
> + {
> + SharedFileSetDeleteAll(fileset);
> + filesetlist = list_delete_cell(filesetlist, l);
>
> Why are we calling SharedFileSetDeleteAll here when in the caller we
> have already deleted the fileset as per below code?
> BufFileDeleteShared(ent->stream_fileset, path);
> + SharedFileSetUnregister(ent->stream_fileset);

That's wrong I have removed this.

> I think it will be good if somehow we can remove the fileset from
> filesetlist during BufFileDeleteShared. If that is possible, then we
> don't need a separate API for SharedFileSetUnregister.

I have done as discussed on later replies, basically called
SharedFileSetUnregister from BufFileDeleteShared.

> 3.
> +static List * filesetlist = NULL;
> +
> static void SharedFileSetOnDetach(dsm_segment *segment, Datum datum);
> +static void SharedFileSetOnProcExit(int status, Datum arg);
> static void SharedFileSetPath(char *path, SharedFileSet *fileset, Oid
> tablespace);
> static void SharedFilePath(char *path, SharedFileSet *fileset, const
> char *name);
> static Oid ChooseTablespace(const SharedFileSet *fileset, const char *name);
> @@ -76,6 +80,13 @@ SharedFileSetInit(SharedFileSet *fileset, dsm_segment *seg)
> /* Register our cleanup callback. */
> if (seg)
> on_dsm_detach(seg, SharedFileSetOnDetach, PointerGetDatum(fileset));
> + else
> + {
> + if (filesetlist == NULL)
> + on_proc_exit(SharedFileSetOnProcExit, 0);
>
> We use NIL for list initialization and comparison. See lock_files usage.

Done

> 4.
> +SharedFileSetOnProcExit(int status, Datum arg)
> +{
> + ListCell *l;
> +
> + /* Loop over all the pending shared fileset entry */
> + foreach (l, filesetlist)
> + {
> + SharedFileSet *fileset = (SharedFileSet *) lfirst(l);
> + SharedFileSetDeleteAll(fileset);
> + }
>
> We can initialize filesetlist as NIL after the for loop as it will
> make the code look clean.

Right.

> Comments on other patches:
> =========================
> 5.
> > 3. On concurrent abort we are truncating all the changes including
> > some incomplete changes, so later when we get the complete changes we
> > don't have the previous changes, e.g, if we had specinsert in the
> > last stream and due to concurrent abort detection if we delete that
> > changes later we will get spec_confirm without spec insert. We could
> > have simply avoided deleting all the changes, but I think the better
> > fix is once we detect the concurrent abort for any transaction, then
> > why do we need to collect the changes for that, we can simply avoid
> > that. So I have put that fix. (0006)
> >
>
> On similar lines, I think we need to skip processing message, see else
> part of code in ReorderBufferQueueMessage.

Basically, ReorderBufferQueueMessage also calls the
ReorderBufferQueueChange internally for transactional changes. But,
having said that, I realize the idea of skipping the changes in
ReorderBufferQueueChange is not good, because by then we have already
allocated the memory for the change and the tuple and it's not a
correct to ReturnChanges because it will update the memory accounting.
So I think we can do it at a more centralized place and before we
process the change, maybe in LogicalDecodingProcessRecord, before
going to the switch we can call a function from the reorderbuffer.c
layer to see whether this transaction is detected as aborted or not.
But I have to think more on this line that can we skip all the
processing of that record or not.

Your other comments look fine to me so I will send in the next patch
set and reply on them individually.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
poc_shared_fileset_cleanup_on_procexit_v1.patch application/octet-stream 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-06-25 13:59:08 Re: xid wraparound danger due to INDEX_CLEANUP false
Previous Message Darafei Komяpa Praliaskouski 2020-06-25 13:31:36 Re: CUBE_MAX_DIM