Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-05-01 15:10:51
Message-ID: CAFiTN-t7W_BTz4kESw5NA+F_9SwhPc5KMUhxTvswqQ4goz_fvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 30, 2020 at 12:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Apr 29, 2020 at 3:19 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Apr 29, 2020 at 2:56 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 3:55 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Apr 28, 2020 at 3:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, Apr 27, 2020 at 4:05 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > >
> > > > > [latest patches]
> > > > >
> > > > > v16-0004-Gracefully-handle-concurrent-aborts-of-uncommitt
> > > > > - Any actions leading to transaction ID assignment are prohibited.
> > > > > That, among others,
> > > > > + Note that access to user catalog tables or regular system catalog tables
> > > > > + in the output plugins has to be done via the
> > > > > <literal>systable_*</literal> scan APIs only.
> > > > > + Access via the <literal>heap_*</literal> scan APIs will error out.
> > > > > + Additionally, any actions leading to transaction ID assignment
> > > > > are prohibited. That, among others,
> > > > > ..
> > > > > @@ -1383,6 +1392,14 @@ heap_fetch(Relation relation,
> > > > > bool valid;
> > > > >
> > > > > /*
> > > > > + * We don't expect direct calls to heap_fetch with valid
> > > > > + * CheckXidAlive for regular tables. Track that below.
> > > > > + */
> > > > > + if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
> > > > > + !(IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))))
> > > > > + elog(ERROR, "unexpected heap_fetch call during logical decoding");
> > > > > +
> > > > >
> > > > > I think comments and code don't match. In the comment, we are saying
> > > > > that via output plugins access to user catalog tables or regular
> > > > > system catalog tables won't be allowed via heap_* APIs but code
> > > > > doesn't seem to reflect it. I feel only
> > > > > TransactionIdIsValid(CheckXidAlive) is sufficient here. See, the
> > > > > original discussion about this point [1] (Refer "I think it'd also be
> > > > > good to add assertions to codepaths not going through systable_*
> > > > > asserting that ...").
> > > >
> > > > Right, So I think we can just add an assert in these function that
> > > > Assert(!TransactionIdIsValid(CheckXidAlive)) ?
> > > >
> > > > >
> > > > > Isn't it better to block the scan to user catalog tables or regular
> > > > > system catalog tables for tableam scan APIs rather than at the heap
> > > > > level? There might be some APIs like heap_getnext where such a check
> > > > > might still be required but I guess it is still better to block at
> > > > > tableam level.
> > > > >
> > > > > [1] - https://www.postgresql.org/message-id/20180726200241.aje4dv4jsv25v4k2%40alap3.anarazel.de
> > > >
> > > > Okay, let me analyze this part. Because someplace we have to keep at
> > > > heap level like heap_getnext and other places at tableam level so it
> > > > seems a bit inconsistent. Also, I think the number of checks might
> > > > going to increase because some of the heap functions like
> > > > heap_hot_search_buffer are being called from multiple tableam calls,
> > > > so we need to put check at every place.
> > > >
> > > > Another point is that I feel some of the checks what we have today
> > > > might not be required like heap_finish_speculative, is not fetching
> > > > any tuple for us so why do we need to care about this function?
> > >
> > > While testing these changes, I have noticed that the systable_* APIs
> > > internally, calls tableam apis and so if we just put assert
> > > Assert(!TransactionIdIsValid(CheckXidAlive)) then it will always hit
> > > that assert. Whether we put these assert in heap APIs or the tableam
> > > APIs because systable_ always access heap through tableam APIs.
> > >
> ..
> ..
> >
> > Putting some more thought upon this, I am just wondering what do we
> > really want any such check because, we are always getting relation
> > description from the reorder buffer code, not from the pgoutput
> > plugin.
> >
>
> But can't they access other catalogs like pg_publication*? I think
> the basic thing we want to ensure here is that all historic accesses
> always use systable* APIs to access catalogs. We can ensure that via
> having Asserts (or elog(ERROR, ..) in heap/tableam APIs.

Yeah, it can. So I have changed it now, actually along with
CheckXidLive, I have kept one more flag so whenever CheckXidLive is
set and we pass through systable_beginscan we will set that flag. So
while accessing the tableam API we will set if CheckXidLive is set
then another flag must also be set otherwise we through an error.

Apart from this, I have also fixed one defect raised by my colleague
Neha Sharma. That issue is the incomplete toast tuple flag was not
reset when the main table tuple was inserted through speculative
insert and due to that data was not streamed even if later we were
getting speculative confirm because incomplete toast flag was never
reset. This patch also includes the fix for the issue raised by Erik.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v17-0003-Extend-the-output-plugin-API-with-stream-methods.patch application/octet-stream 34.8 KB
v17-0001-Immediately-WAL-log-assignments.patch application/octet-stream 10.5 KB
v17-0004-Gracefully-handle-concurrent-aborts-of-uncommitt.patch application/octet-stream 12.4 KB
v17-0005-Implement-streaming-mode-in-ReorderBuffer.patch application/octet-stream 37.4 KB
v17-0002-Issue-individual-invalidations-with-wal_level-lo.patch application/octet-stream 16.7 KB
v17-0008-Enable-streaming-for-all-subscription-TAP-tests.patch application/octet-stream 14.7 KB
v17-0007-Track-statistics-for-streaming.patch application/octet-stream 11.7 KB
v17-0009-Add-TAP-test-for-streaming-vs.-DDL.patch application/octet-stream 4.4 KB
v17-0010-Bugfix-handling-of-incomplete-toast-tuple.patch application/octet-stream 15.6 KB
v17-0006-Add-support-for-streaming-to-built-in-replicatio.patch application/octet-stream 91.3 KB
v17-0012-Add-streaming-option-in-pg_dump.patch application/octet-stream 2.7 KB
v17-0011-Provide-new-api-to-get-the-streaming-changes.patch application/octet-stream 4.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-05-01 15:44:11 Re: Unify drop-by-OID functions
Previous Message Pavel Stehule 2020-05-01 14:50:18 Re: Unify drop-by-OID functions