Skip site navigation (1) Skip section navigation (2)

Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-01-19 13:05:21
Message-ID: CABOikdMSg-g_CiWga9c=zGrVC_86qjfYKvDrUeyPs-P=U62ffg@mail.gmail.com (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
Hi Alvaro,

On Tue, Jan 17, 2017 at 8:41 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Reading through the track_root_lp patch now.
>
>
Thanks for the review.


> > +             /*
> > +              * For HOT (or WARM) updated tuples, we store the offset
> of the root
> > +              * line pointer of this chain in the ip_posid field of the
> new tuple.
> > +              * Usually this information will be available in the
> corresponding
> > +              * field of the old tuple. But for aborted updates or
> pg_upgraded
> > +              * databases, we might be seeing the old-style CTID chains
> and hence
> > +              * the information must be obtained by hard way
> > +              */
> > +             if (HeapTupleHeaderHasRootOffset(oldtup.t_data))
> > +                     root_offnum = HeapTupleHeaderGetRootOffset(o
> ldtup.t_data);
> > +             else
> > +                     heap_get_root_tuple_one(page,
> > +                                     ItemPointerGetOffsetNumber(&(
> oldtup.t_self)),
> > +                                     &root_offnum);
>
> Hmm.  So the HasRootOffset tests the HEAP_LATEST_TUPLE bit, which is
> reset temporarily during an update.  So that case shouldn't occur often.
>

Right. The root offset is stored only in those tuples where
HEAP_LATEST_TUPLE is set. This flag should generally be set on the tuples
that are being updated, except for the case when the last update failed and
the flag was cleared. In other common case is going to be pg-upgraded
cluster where none of the existing tuples will have this flag set. So in
those cases, we must find the root line pointer hard way.


>
> Oh, I just noticed that HeapTupleHeaderSetNextCtid also clears the flag.
>

Yes, but this should happen only during updates and unless the update
fails, the next-to-be-updated tuple should have the flag set.


>
> > @@ -4166,10 +4189,29 @@ l2:
> >               HeapTupleClearHotUpdated(&oldtup);
> >               HeapTupleClearHeapOnly(heaptup);
> >               HeapTupleClearHeapOnly(newtup);
> > +             root_offnum = InvalidOffsetNumber;
> >       }
> >
> > -     RelationPutHeapTuple(relation, newbuf, heaptup, false);
>  /* insert new tuple */
> > +     /* insert new tuple */
> > +     RelationPutHeapTuple(relation, newbuf, heaptup, false,
> root_offnum);
> > +     HeapTupleHeaderSetHeapLatest(heaptup->t_data);
> > +     HeapTupleHeaderSetHeapLatest(newtup->t_data);
> >
> > +     /*
> > +      * Also update the in-memory copy with the root line pointer
> information
> > +      */
> > +     if (OffsetNumberIsValid(root_offnum))
> > +     {
> > +             HeapTupleHeaderSetRootOffset(heaptup->t_data,
> root_offnum);
> > +             HeapTupleHeaderSetRootOffset(newtup->t_data, root_offnum);
> > +     }
> > +     else
> > +     {
> > +             HeapTupleHeaderSetRootOffset(heaptup->t_data,
> > +                             ItemPointerGetOffsetNumber(&h
> eaptup->t_self));
> > +             HeapTupleHeaderSetRootOffset(newtup->t_data,
> > +                             ItemPointerGetOffsetNumber(&h
> eaptup->t_self));
> > +     }
>
> This is repetitive.  I think after RelationPutHeapTuple it'd be better
> to assign root_offnum = &heaptup->t_self, so that we can just call
> SetRootOffset() on each tuple without the if().
>

Fixed. I actually ripped off HeapTupleHeaderSetRootOffset() completely and
pushed setting of root line pointer into the HeapTupleHeaderSetHeapLatest().
That seems much cleaner because the system expects to find root line
pointer whenever HEAP_LATEST_TUPLE flag is set. Hence it makes sense to set
them together.


>
>
> > +             HeapTupleHeaderSetHeapLatest((HeapTupleHeader) item);
> > +             if (OffsetNumberIsValid(root_offnum))
> > +                     HeapTupleHeaderSetRootOffset((HeapTupleHeader)
> item,
> > +                                     root_offnum);
> > +             else
> > +                     HeapTupleHeaderSetRootOffset((HeapTupleHeader)
> item,
> > +                                     offnum);
>
> Just a matter of style, but this reads nicer IMO:
>
>         HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
>                 OffsetNumberIsValid(root_offnum) ? root_offnum : offnum);
>

Understood. This code no longer exists in the new patch since
HeapTupleHeaderSetRootOffset is merged with HeapTupleHeaderSetHeapLatest.


>
>
> > @@ -740,8 +742,9 @@ heap_page_prune_execute(Buffer buffer,
> >   * holds a pin on the buffer. Once pin is released, a tuple might be
> pruned
> >   * and reused by a completely unrelated tuple.
> >   */
> > -void
> > -heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
> > +static void
> > +heap_get_root_tuples_internal(Page page, OffsetNumber target_offnum,
> > +             OffsetNumber *root_offsets)
> >  {
> >       OffsetNumber offnum,
>
> I think this function deserves more/better/updated commentary.
>

Sure. I added more commentary. I also reworked the function so that the
caller can pass just one item array when it's interested in finding root
line pointer for just one item. Hopefully that will save a few bytes on the
stack.


>
> > @@ -439,7 +439,9 @@ rewrite_heap_tuple(RewriteState state,
> >                        * set the ctid of this tuple to point to the new
> location, and
> >                        * insert it right away.
> >                        */
> > -                     new_tuple->t_data->t_ctid = mapping->new_tid;
> > +                     HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> > +                                     ItemPointerGetBlockNumber(&ma
> pping->new_tid),
> > +                                     ItemPointerGetOffsetNumber(&m
> apping->new_tid));
>
> I think this would be nicer:
>         HeapTupleHeaderSetNextTid(new_tuple->t_data, &mapping->new_tid);
> AFAICS all the callers are doing ItemPointerGetFoo for a TID, so this is
> overly verbose for no reason.  Also, the "c" in Ctid stands for
> "current"; I think we can omit that.
>

Yes, fixed. I realised that all callers where anyways calling the macro
with the block/offset of the same TID. So it makes sense to just pass TID
to the macro.


>
> > @@ -525,7 +527,9 @@ rewrite_heap_tuple(RewriteState state,
> >                               new_tuple = unresolved->tuple;
> >                               free_new = true;
> >                               old_tid = unresolved->old_tid;
> > -                             new_tuple->t_data->t_ctid = new_tid;
> > +                             HeapTupleHeaderSetNextCtid(ne
> w_tuple->t_data,
> > +
>  ItemPointerGetBlockNumber(&new_tid),
> > +
>  ItemPointerGetOffsetNumber(&new_tid));
>
> Did you forget to SetHeapLatest here, or ..?  (If not, a comment is
> warranted).
>

Umm probably not. The way I see it, new_tuple is not actually the new tuple
when this is called, but it's changed to the unresolved tuple (see the
start of the hunk). So what we're doing is setting next CTID in the
previous tuple in the chain. SetHeapLatest is called on the new tuple
inside raw_heap_insert(). I did not add any more comments, but please let
me know if you think it's still confusing or if I'm missing something.


>
> > diff --git a/src/backend/executor/execMain.c
> b/src/backend/executor/execMain.c
> > index 32bb3f9..466609c 100644
> > --- a/src/backend/executor/execMain.c
> > +++ b/src/backend/executor/execMain.c
> > @@ -2443,7 +2443,7 @@ EvalPlanQualFetch(EState *estate, Relation
> relation, int lockmode,
> >                * As above, it should be safe to examine xmax and t_ctid
> without the
> >                * buffer content lock, because they can't be changing.
> >                */
> > -             if (ItemPointerEquals(&tuple.t_self,
> &tuple.t_data->t_ctid))
> > +             if (HeapTupleHeaderIsHeapLatest(tuple.t_data,
> tuple.t_self))
> >               {
> >                       /* deleted, so forget about it */
> >                       ReleaseBuffer(buffer);
>
> This is the place where this patch would have an effect.  To test this
> bit I think we're going to need an ad-hoc stress-test harness.
>

Sure. I did some pgbench tests and ran consistency checks during and at the
end of tests. I chose a small scale factor and many clients so that same
tuple is often concurrently updated. That should exercise the new
chain-following code reguorsly. But I'll do more of those on a bigger box.
Do you have other suggestions for ad-hoc tests?


>
>
> > +/*
> > + * If HEAP_LATEST_TUPLE is set in the last tuple in the update chain.
> But for
> > + * clusters which are upgraded from pre-10.0 release, we still check if
> c_tid
> > + * is pointing to itself and declare such tuple as the latest tuple in
> the
> > + * chain
> > + */
> > +#define HeapTupleHeaderIsHeapLatest(tup, tid) \
> > +( \
> > +  ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) || \
> > +  ((ItemPointerGetBlockNumber(&(tup)->t_ctid) ==
> ItemPointerGetBlockNumber(&tid)) && \
> > +   (ItemPointerGetOffsetNumber(&(tup)->t_ctid) ==
> ItemPointerGetOffsetNumber(&tid))) \
> > +)
>
> Please add a "!= 0" to the first arm of the ||, so that we return a
> boolean.
>
>
Done. Also rebased with new master where similar changes have been done.


>
> > +/*
> > + * Get TID of next tuple in the update chain. Traditionally, we have
> stored
> > + * self TID in the t_ctid field if the tuple is the last tuple in the
> chain. We
> > + * try to preserve that behaviour by returning self-TID if
> HEAP_LATEST_TUPLE
> > + * flag is set.
> > + */
> > +#define HeapTupleHeaderGetNextCtid(tup, next_ctid, offnum) \
> > +do { \
> > +     if ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) \
> > +     { \
> > +             ItemPointerSet((next_ctid), ItemPointerGetBlockNumber(&(tup)->t_ctid),
> \
> > +                             (offnum)); \
> > +     } \
> > +     else \
> > +     { \
> > +             ItemPointerSet((next_ctid), ItemPointerGetBlockNumber(&(tup)->t_ctid),
> \
> > +                             ItemPointerGetOffsetNumber(&(tup)->t_ctid));
> \
> > +     } \
> > +} while (0)
>
> This is a really odd macro, I think.  Is any of the callers really
> depending on the traditional behavior?  If so, can we change them to
> avoid that?  (I think the "else" can be more easily written with
> ItemPointerCopy).  In any case, I think the documentation of the macro
> leaves a bit to be desired -- I don't think we really care all that much
> what we used to do, except perhaps as a secondary comment, but we do
> care very much about what it actually does, which the current comment
> doesn't really explain.
>
>
I reworked this quite a bit and I believe the new code does what you
suggested.  The HeapTupleHeaderGetNextTid macro is now much simpler (it
just copies the TID) and we leave it to the caller to ensure they don't
call this on a tuple which is already at the end of the chain (i.e has
HEAP_LATEST_TUPLE set, but we don't look for old-style end-of-the-chain
markers). The callers can choose to return the same TID back if their
callers rely on that behaviour. But inside this macro, we now assert that
HEAP_LATEST_TUPLE is not set.

One thing that worried me is if there exists a path which sets the
t_infomask (and hence HEAP_LATEST_TUPLE) during redo recovery and if we
will fail to set the root line pointer correctly along with that. But
AFAICS the interesting cases of insert, multi-insert and update are being
handled ok. The only other places where I saw t_infomask being copied as-is
from the WAL record is DecodeXLogTuple() and DecodeMultiInsert(), but those
should not cause any problem AFAICS.

Revised patch is attached. All regression tests, isolation tests and
pgbench test with -c40 -j10 pass on my laptop.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment: 0001_track_root_lp_v9.patch
Description: application/octet-stream (37.3 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Karl O. PincDate: 2017-01-19 13:11:31
Subject: Re: Patch to implement pg_current_logfile() function
Previous:From: Antonin HouskaDate: 2017-01-19 12:58:22
Subject: Re: PoC: Grouped base relation

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group