Re: Minimal logical decoding on standbys

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2019-06-21 15:50:15
Message-ID: CAJ3gD9f_HjQ6qP=+1jwzwy77fwcbT4-M3UvVsqpAzsY-jqM8nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 20 Jun 2019 at 00:31, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2019-06-12 17:30:02 +0530, Amit Khandekar wrote:
> > In the attached v6 version of the patch, I did the above. That is, I
> > used XLogFindNextRecord() to bump up the restart_lsn of the slot to
> > the first valid record. But since XLogReaderState is not available in
> > ReplicationSlotReserveWal(), I did this in
> > DecodingContextFindStartpoint(). And then updated the slot restart_lsn
> > with this corrected position.
>
> > Since XLogFindNextRecord() is currently disabled using #if 0, removed
> > this directive.
>
> Well, ifdef FRONTEND. I don't think that's a problem. It's a bit
> overkill here, because I think we know the address has to be on a record
> boundary (rather than being in the middle of a page spanning WAL
> record). So we could just add add the size of the header manually
> - but I think that's not worth doing.
>
>
> > > Or else, do you think we can just increment the record pointer by
> > > doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
> > > SizeOfXLogShortPHD() ?
> >
> > I found out that we can't do this, because we don't know whether the
> > xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
> > our context, it is SizeOfXLogLongPHD. So we indeed need the
> > XLogReaderState handle.
>
> Well, we can determine whether a long or a short header is going to be
> used, as that's solely dependent on the LSN:
>
> /*
> * If first page of an XLOG segment file, make it a long header.
> */
> if ((XLogSegmentOffset(NewPage->xlp_pageaddr, wal_segment_size)) == 0)
> {
> XLogLongPageHeader NewLongPage = (XLogLongPageHeader) NewPage;
>
> NewLongPage->xlp_sysid = ControlFile->system_identifier;
> NewLongPage->xlp_seg_size = wal_segment_size;
> NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
> NewPage->xlp_info |= XLP_LONG_HEADER;
> }
>
> but I don't think that's worth it.

Ok, so what you are saying is : In case of ReplayRecPtr, it is always
possible to know whether it is pointing at a long header or short
header, just by looking at its value. And then we just increment it by
the header size after knowing the header size. Why do you think it is
no worth it ? In fact, I thought we *have* to increment it to set it
to the next record. Didn't understand what other option we have.

>
>
> > > Do you think that we can solve this using some other approach ? I am
> > > not sure whether it's only the initial conditions that cause
> > > lastReplayedEndRecPtr value to *not* point to a valid record, or is it
> > > just a coincidence and that lastReplayedEndRecPtr can also have such a
> > > value any time afterwards.
>
> It's always possible. All that means is that the last record filled the
> entire last WAL page.

Ok that means we *have* to bump the pointer ahead.

>
>
> > > If it's only possible initially, we can
> > > just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
> > > lastReplayedEndRecPtr is invalid.
>
> I don't think so? The redo pointer will point to something *much*
> earlier, where we'll not yet have done all the necessary conflict
> handling during recovery? So we'd not necessarily notice that a slot
> is not actually usable for decoding.
>
> We could instead just handle that by starting decoding at the redo
> pointer, and just ignore all WAL records until they're after
> lastReplayedEndRecPtr, but that has no advantages, and will read a lot
> more WAL.

Yeah I agree : just doing this for initial case is a bad idea.

>
>
>
>
> > static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
> > @@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, TransactionId latestRemovedX
> > */
> >
> > /* XLOG stuff */
> > + xlrec_reuse.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
> > xlrec_reuse.node = rel->rd_node;
> > xlrec_reuse.block = blkno;
> > xlrec_reuse.latestRemovedXid = latestRemovedXid;
> > @@ -1140,6 +1142,7 @@ _bt_delitems_delete(Relation rel, Buffer buf,
> > XLogRecPtr recptr;
> > xl_btree_delete xlrec_delete;
> >
> > + xlrec_delete.onCatalogTable = get_rel_logical_catalog(rel->rd_index->indrelid);
> > xlrec_delete.latestRemovedXid = latestRemovedXid;
> > xlrec_delete.nitems = nitems;
>
> Can we instead pass the heap rel down to here? I think there's only one
> caller, and it has the heap relation available these days (it didn't at
> the time of the prototype, possibly). There's a few other users of
> get_rel_logical_catalog() where that might be harder, but it's easy
> here.

For _bt_log_reuse_page(), it's only caller is _bt_getbuf() which does
not have heapRel parameter. Let me know which caller you were
referring to that has heapRel.

For _bt_delitems_delete(), it itself has heapRel param, so I will use
this for onCatalogTable.

>
>
> > @@ -27,6 +27,7 @@
> > #include "storage/indexfsm.h"
> > #include "storage/lmgr.h"
> > #include "utils/snapmgr.h"
> > +#include "utils/lsyscache.h"
> >
> >
> > /* Entry in pending-list of TIDs we need to revisit */
> > @@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer)
> > OffsetNumber itemnos[MaxIndexTuplesPerPage];
> > spgxlogVacuumRedirect xlrec;
> >
> > + xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid);
> > xlrec.nToPlaceholder = 0;
> > xlrec.newestRedirectXid = InvalidTransactionId;
>
> This one seems harder, but I'm not actually sure why we make it so
> hard. It seems like we just ought to add the table to IndexVacuumInfo.

This means we have to add heapRel assignment wherever we initialize
IndexVacuumInfo structure, namely in lazy_vacuum_index(),
lazy_cleanup_index(), validate_index(), analyze_rel(), and make sure
these functions have a heap rel handle. Do you think we should do this
as part of this patch ?

> > + if (TransactionIdIsValid(slot_catalog_xmin) && TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid))
> > + {
> > + found_conflict = true;
> > +
> > + ereport(LOG,
> > + (errmsg("slot %s w/ catalog xmin %u conflicts with removed xid %u",
> > + NameStr(slotname), slot_catalog_xmin, xid)));
> > + }
> > +
> > + }
> > + if (found_conflict)
> > + {

The above changes seem to be from the older version (v6) of the patch.
Just wanted to make sure you are using v8 patch.

>
>
> Hm, as far as I can tell you just ignore that the slot might currently
> be in use. You can't just drop a slot that somebody is using. I think
> you need to send a recovery conflict to that backend.

Yeah, I am currently working on this. As you suggested, I am going to
call CancelVirtualTransaction() and for its sigmode parameter, I will
pass a new ProcSignalReason value PROCSIG_RECOVERY_CONFLICT_SLOT.

>
>
>
> > + elog(LOG, "Dropping conflicting slot %s", s->data.name.data);
>
> This definitely needs to be expanded, and follow the message style
> guideline.

This message , with the v8 patch, looks like this :
ereport(LOG,
(errmsg("Dropping conflicting slot %s", NameStr(slotname)),
errdetail("%s", reason)));
where reason is a char string.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-06-21 16:12:00 Re: BUG #15865: ALTER TABLE statements causing "relation already exists" errors when some indexes exist
Previous Message Jim Finnerty 2019-06-21 15:41:11 Re: Implementing Incremental View Maintenance