Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-05-26 09:34:02
Message-ID: CAA4eK1+C=eqqUunV29N=8BcKvLk6wEx41zDQCWWVSx+zT2z-VQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 25, 2020 at 8:07 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, May 22, 2020 at 11:54 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > 4.
> > + * XXX Do we need to allocate it in TopMemoryContext?
> > + */
> > +static void
> > +subxact_info_add(TransactionId xid)
> > {
> > ..
> >
> > For this and other places in a patch like in function
> > stream_open_file(), instead of using TopMemoryContext, can we consider
> > using a new memory context LogicalStreamingContext or something like
> > that. We can create LogicalStreamingContext under TopMemoryContext. I
> > don't see any need of using TopMemoryContext here.
>
> But, when we will delete/reset the LogicalStreamingContext?
>

Why can't we reset it at each stream stop message?

> because
> we are planning to keep this memory until the worker is alive so that
> supposed to be the top memory context.
>

Which part of allocation do we want to keep till the worker is alive?
Why we need memory-related to subxacts till the worker is alive? As
we have now, after reading subxact info (subxact_info_read), we need
to ensure that it is freed after its usage due to which we need to
remember and perform pfree at various places.

I think we should once see the possibility that such that we could
switch to this new context in start stream message and reset it in
stop stream message. That might help in avoiding
MemoryContextSwitchTo TopMemoryContext at various places.

> If we create any other context
> with the same life span as TopMemoryContext then what is the point?
>

It is helpful for debugging. It is recommended that we don't use the
top memory context unless it is really required. Read about it in
src/backend/utils/mmgr/README.

>
> > 8.
> > + * XXX Maybe we should only include the checksum when the cluster is
> > + * initialized with checksums?
> > + */
> > +static void
> > +subxact_info_write(Oid subid, TransactionId xid)
> >
> > Do we really need to have the checksum for temporary files? I have
> > checked a few other similar cases like SharedFileSet stuff for
> > parallel hash join but didn't find them using checksums. Can you also
> > once see other usages of temporary files and then let us decide if we
> > see any reason to have checksums for this?
>
> Yeah, even I can see other places checksum is not used.
>

So, unless someone speaks up before you are ready for the next version
of the patch, can we remove it?

> >
> > Another point is we don't seem to be doing this for 'changes' file,
> > see stream_write_change. So, not sure, there is any sense to write
> > checksum for subxact file.
>
> I can see there are comment atop this function
>
> * XXX The subxact file includes CRC32C of the contents. Maybe we should
> * include something like that here too, but doing so will not be as
> * straighforward, because we write the file in chunks.
>

You can remove this comment as well. I don't know how advantageous it
is to checksum temporary files. We can anyway add it later if there
is a reason for doing so.

>
>
> > 12.
> > maybe_send_schema()
> > {
> > ..
> > + if (in_streaming)
> > + {
> > + /*
> > + * TOCHECK: We have to send schema after each catalog change and it may
> > + * occur when streaming already started, so we have to track new catalog
> > + * changes somehow.
> > + */
> > + schema_sent = get_schema_sent_in_streamed_txn(relentry, topxid);
> > ..
> > ..
> > }
> >
> > I think it is good to once verify/test what this comment says but as
> > per code we should be sending the schema after each catalog change as
> > we invalidate the streamed_txns list in rel_sync_cache_relation_cb
> > which must be called during relcache invalidation. Do we see any
> > problem with that mechanism?
>
> I have tested this, I think we are already sending the schema after
> each catalog change.
>

Then remove "TOCHECK" in the above comment.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bert Scalzo 2020-05-26 10:10:44 Re: New Feature Request
Previous Message Dilip Kumar 2020-05-26 09:13:59 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions