Re: Logical decoding for operations on zheap tables

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical decoding for operations on zheap tables
Date: 2019-01-04 03:24:34
Message-ID: CAA4eK1JaY3c6WFJoWYj0Vawew3g2o=iqmWChSk54FvhKReo9Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 3, 2019 at 11:30 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2018-12-31 09:56:48 +0530, Amit Kapila wrote:
> > To support logical decoding for zheap operations, we need a way to
> > ensure zheap tuples can be registered as change streams. One idea
> > could be that we make ReorderBufferChange aware of another kind of
> > tuples as well, something like this:
> >
> > @@ -100,6 +123,20 @@ typedef struct ReorderBufferChange
> > ReorderBufferTupleBuf *newtuple;
> > } tp;
> > + struct
> > + {
> > + /* relation that has been changed */
> > + RelFileNode relnode;
> > +
> > + /* no previously reassembled toast chunks are necessary anymore */
> > + bool clear_toast_afterwards;
> > +
> > + /* valid for DELETE || UPDATE */
> > + ReorderBufferZTupleBuf *oldtuple;
> > + /* valid for INSERT || UPDATE */
> > + ReorderBufferZTupleBuf *newtuple;
> > + } ztp;
> > +
> >
> >
> > +/* an individual zheap tuple, stored in one chunk of memory */
> > +typedef struct ReorderBufferZTupleBuf
> > +{
> > ..
> > + /* tuple header, the interesting bit for users of logical decoding */
> > + ZHeapTupleData tuple;
> > ..
> > +} ReorderBufferZTupleBuf;
> >
> > Apart from this, we need to define different decode functions for
> > zheap operations as the WAL data is different for heap and zheap, so
> > same functions can't be used to decode.
>
> I'm very strongly opposed to that. We shouldn't have expose every
> possible storage method to output plugins, that'll make extensibility
> a farce. I think we'll either have to re-form a HeapTuple or decide
> to bite the bullet and start exposing tuples via slots.
>

To be clear, you are against exposing different format of tuples to
plugins, not having different decoding routines for other storage
engines, because later part is unavoidable due to WAL format. Now,
about tuple format, I guess it would be a lot better if we expose via
slots, but won't that make existing plugins to change the way they
decode the tuple, maybe that is okay? OTOH, re-forming the heap tuple
has a cost which might be okay for the time being or first version,
but eventually, we want to avoid that. The other reason why I
refrained from tuple conversion was that I was not sure if we anywhere
rely on the transaction information in the tuple during decode
process, because that will be tricky to mimic, but I guess we don't
check that.

The only point for exposing a different tuple format via plugin was a
performance which I think can be addressed if we expose via slots. I
don't want to take up exposing slots instead of tuples for plugins as
part of this project and I think if we want to go with that, it is
better done as part of pluggable API?

>
> > This email is primarily to discuss about how the logical decoding for
> > basic DML operations (Insert/Update/Delete) will work in zheap. We
> > might need some special mechanism to deal with sub-transactions as
> > zheap doesn't generate a transaction id for sub-transactions, but we
> > can discuss that separately.
>
> Subtransactions seems to be the hardest part besides the tuple format
> issue, so I think we should discuss that very soon.
>

Agreed, I am going to look at that part next.

>
> > /*
> > * Write relation description to the output stream.
> > */
> > diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> > index 23466bade2..70fb5e2934 100644
> > --- a/src/backend/replication/logical/reorderbuffer.c
> > +++ b/src/backend/replication/logical/reorderbuffer.c
> > @@ -393,6 +393,19 @@ ReorderBufferReturnChange(ReorderBuffer *rb, ReorderBufferChange *change)
> > change->data.tp.oldtuple = NULL;
> > }
> > break;
> > + case REORDER_BUFFER_CHANGE_ZINSERT:
>
> This really needs to be undistinguishable from normal CHANGE_INSERT...
>

Sure, it will be if we decide to either re-form heap tuple or expose via slots.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-04 03:30:58 Re: Logical decoding for operations on zheap tables
Previous Message Mithun Cy 2019-01-04 02:57:54 Re: WIP: Avoid creation of the free space map for small tables