Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-04-30 07:00:49
Message-ID: CAA4eK1KQUoFyxh0WO0NZJ2pQj+ze9M35VS7RZKU4wbJfk1Mv0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 29, 2020 at 3:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Apr 29, 2020 at 2:56 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Tue, Apr 28, 2020 at 3:55 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 3:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Apr 27, 2020 at 4:05 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > [latest patches]
> > > >
> > > > v16-0004-Gracefully-handle-concurrent-aborts-of-uncommitt
> > > > - Any actions leading to transaction ID assignment are prohibited.
> > > > That, among others,
> > > > + Note that access to user catalog tables or regular system catalog tables
> > > > + in the output plugins has to be done via the
> > > > <literal>systable_*</literal> scan APIs only.
> > > > + Access via the <literal>heap_*</literal> scan APIs will error out.
> > > > + Additionally, any actions leading to transaction ID assignment
> > > > are prohibited. That, among others,
> > > > ..
> > > > @@ -1383,6 +1392,14 @@ heap_fetch(Relation relation,
> > > > bool valid;
> > > >
> > > > /*
> > > > + * We don't expect direct calls to heap_fetch with valid
> > > > + * CheckXidAlive for regular tables. Track that below.
> > > > + */
> > > > + if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
> > > > + !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))))
> > > > + elog(ERROR, "unexpected heap_fetch call during logical decoding");
> > > > +
> > > >
> > > > I think comments and code don't match. In the comment, we are saying
> > > > that via output plugins access to user catalog tables or regular
> > > > system catalog tables won't be allowed via heap_* APIs but code
> > > > doesn't seem to reflect it. I feel only
> > > > TransactionIdIsValid(CheckXidAlive) is sufficient here. See, the
> > > > original discussion about this point [1] (Refer "I think it'd also be
> > > > good to add assertions to codepaths not going through systable_*
> > > > asserting that ...").
> > >
> > > Right, So I think we can just add an assert in these function that
> > > Assert(!TransactionIdIsValid(CheckXidAlive)) ?
> > >
> > > >
> > > > Isn't it better to block the scan to user catalog tables or regular
> > > > system catalog tables for tableam scan APIs rather than at the heap
> > > > level? There might be some APIs like heap_getnext where such a check
> > > > might still be required but I guess it is still better to block at
> > > > tableam level.
> > > >
> > > > [1] - https://www.postgresql.org/message-id/20180726200241.aje4dv4jsv25v4k2%40alap3.anarazel.de
> > >
> > > Okay, let me analyze this part. Because someplace we have to keep at
> > > heap level like heap_getnext and other places at tableam level so it
> > > seems a bit inconsistent. Also, I think the number of checks might
> > > going to increase because some of the heap functions like
> > > heap_hot_search_buffer are being called from multiple tableam calls,
> > > so we need to put check at every place.
> > >
> > > Another point is that I feel some of the checks what we have today
> > > might not be required like heap_finish_speculative, is not fetching
> > > any tuple for us so why do we need to care about this function?
> >
> > While testing these changes, I have noticed that the systable_* APIs
> > internally, calls tableam apis and so if we just put assert
> > Assert(!TransactionIdIsValid(CheckXidAlive)) then it will always hit
> > that assert. Whether we put these assert in heap APIs or the tableam
> > APIs because systable_ always access heap through tableam APIs.
> >
..
..
>
> Putting some more thought upon this, I am just wondering what do we
> really want any such check because, we are always getting relation
> description from the reorder buffer code, not from the pgoutput
> plugin.
>

But can't they access other catalogs like pg_publication*? I think
the basic thing we want to ensure here is that all historic accesses
always use systable* APIs to access catalogs. We can ensure that via
having Asserts (or elog(ERROR, ..) in heap/tableam APIs.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-04-30 07:18:57 Re: WAL usage calculation patch
Previous Message Amit Langote 2020-04-30 06:56:00 Re: Autovacuum on partitioned table (autoanalyze)