Re: Patch: Write Amplification Reduction Method (WARM)

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-01-17 15:11:27
Message-ID: 20170117151127.inek227mgahtgomb@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Reading through the track_root_lp patch now.

> + /*
> + * For HOT (or WARM) updated tuples, we store the offset of the root
> + * line pointer of this chain in the ip_posid field of the new tuple.
> + * Usually this information will be available in the corresponding
> + * field of the old tuple. But for aborted updates or pg_upgraded
> + * databases, we might be seeing the old-style CTID chains and hence
> + * the information must be obtained by hard way
> + */
> + if (HeapTupleHeaderHasRootOffset(oldtup.t_data))
> + root_offnum = HeapTupleHeaderGetRootOffset(oldtup.t_data);
> + else
> + heap_get_root_tuple_one(page,
> + ItemPointerGetOffsetNumber(&(oldtup.t_self)),
> + &root_offnum);

Hmm. So the HasRootOffset tests the HEAP_LATEST_TUPLE bit, which is
reset temporarily during an update. So that case shouldn't occur often.

Oh, I just noticed that HeapTupleHeaderSetNextCtid also clears the flag.

> @@ -4166,10 +4189,29 @@ l2:
> HeapTupleClearHotUpdated(&oldtup);
> HeapTupleClearHeapOnly(heaptup);
> HeapTupleClearHeapOnly(newtup);
> + root_offnum = InvalidOffsetNumber;
> }
>
> - RelationPutHeapTuple(relation, newbuf, heaptup, false); /* insert new tuple */
> + /* insert new tuple */
> + RelationPutHeapTuple(relation, newbuf, heaptup, false, root_offnum);
> + HeapTupleHeaderSetHeapLatest(heaptup->t_data);
> + HeapTupleHeaderSetHeapLatest(newtup->t_data);
>
> + /*
> + * Also update the in-memory copy with the root line pointer information
> + */
> + if (OffsetNumberIsValid(root_offnum))
> + {
> + HeapTupleHeaderSetRootOffset(heaptup->t_data, root_offnum);
> + HeapTupleHeaderSetRootOffset(newtup->t_data, root_offnum);
> + }
> + else
> + {
> + HeapTupleHeaderSetRootOffset(heaptup->t_data,
> + ItemPointerGetOffsetNumber(&heaptup->t_self));
> + HeapTupleHeaderSetRootOffset(newtup->t_data,
> + ItemPointerGetOffsetNumber(&heaptup->t_self));
> + }

This is repetitive. I think after RelationPutHeapTuple it'd be better
to assign root_offnum = &heaptup->t_self, so that we can just call
SetRootOffset() on each tuple without the if().

> + HeapTupleHeaderSetHeapLatest((HeapTupleHeader) item);
> + if (OffsetNumberIsValid(root_offnum))
> + HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> + root_offnum);
> + else
> + HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
> + offnum);

Just a matter of style, but this reads nicer IMO:

HeapTupleHeaderSetRootOffset((HeapTupleHeader) item,
OffsetNumberIsValid(root_offnum) ? root_offnum : offnum);

> @@ -740,8 +742,9 @@ heap_page_prune_execute(Buffer buffer,
> * holds a pin on the buffer. Once pin is released, a tuple might be pruned
> * and reused by a completely unrelated tuple.
> */
> -void
> -heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
> +static void
> +heap_get_root_tuples_internal(Page page, OffsetNumber target_offnum,
> + OffsetNumber *root_offsets)
> {
> OffsetNumber offnum,

I think this function deserves more/better/updated commentary.

> @@ -439,7 +439,9 @@ rewrite_heap_tuple(RewriteState state,
> * set the ctid of this tuple to point to the new location, and
> * insert it right away.
> */
> - new_tuple->t_data->t_ctid = mapping->new_tid;
> + HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> + ItemPointerGetBlockNumber(&mapping->new_tid),
> + ItemPointerGetOffsetNumber(&mapping->new_tid));

I think this would be nicer:
HeapTupleHeaderSetNextTid(new_tuple->t_data, &mapping->new_tid);
AFAICS all the callers are doing ItemPointerGetFoo for a TID, so this is
overly verbose for no reason. Also, the "c" in Ctid stands for
"current"; I think we can omit that.

> @@ -525,7 +527,9 @@ rewrite_heap_tuple(RewriteState state,
> new_tuple = unresolved->tuple;
> free_new = true;
> old_tid = unresolved->old_tid;
> - new_tuple->t_data->t_ctid = new_tid;
> + HeapTupleHeaderSetNextCtid(new_tuple->t_data,
> + ItemPointerGetBlockNumber(&new_tid),
> + ItemPointerGetOffsetNumber(&new_tid));

Did you forget to SetHeapLatest here, or ..? (If not, a comment is
warranted).

> diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
> index 32bb3f9..466609c 100644
> --- a/src/backend/executor/execMain.c
> +++ b/src/backend/executor/execMain.c
> @@ -2443,7 +2443,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
> * As above, it should be safe to examine xmax and t_ctid without the
> * buffer content lock, because they can't be changing.
> */
> - if (ItemPointerEquals(&tuple.t_self, &tuple.t_data->t_ctid))
> + if (HeapTupleHeaderIsHeapLatest(tuple.t_data, tuple.t_self))
> {
> /* deleted, so forget about it */
> ReleaseBuffer(buffer);

This is the place where this patch would have an effect. To test this
bit I think we're going to need an ad-hoc stress-test harness.

> +/*
> + * If HEAP_LATEST_TUPLE is set in the last tuple in the update chain. But for
> + * clusters which are upgraded from pre-10.0 release, we still check if c_tid
> + * is pointing to itself and declare such tuple as the latest tuple in the
> + * chain
> + */
> +#define HeapTupleHeaderIsHeapLatest(tup, tid) \
> +( \
> + ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) || \
> + ((ItemPointerGetBlockNumber(&(tup)->t_ctid) == ItemPointerGetBlockNumber(&tid)) && \
> + (ItemPointerGetOffsetNumber(&(tup)->t_ctid) == ItemPointerGetOffsetNumber(&tid))) \
> +)

Please add a "!= 0" to the first arm of the ||, so that we return a boolean.

> +/*
> + * Get TID of next tuple in the update chain. Traditionally, we have stored
> + * self TID in the t_ctid field if the tuple is the last tuple in the chain. We
> + * try to preserve that behaviour by returning self-TID if HEAP_LATEST_TUPLE
> + * flag is set.
> + */
> +#define HeapTupleHeaderGetNextCtid(tup, next_ctid, offnum) \
> +do { \
> + if ((tup)->t_infomask2 & HEAP_LATEST_TUPLE) \
> + { \
> + ItemPointerSet((next_ctid), ItemPointerGetBlockNumber(&(tup)->t_ctid), \
> + (offnum)); \
> + } \
> + else \
> + { \
> + ItemPointerSet((next_ctid), ItemPointerGetBlockNumber(&(tup)->t_ctid), \
> + ItemPointerGetOffsetNumber(&(tup)->t_ctid)); \
> + } \
> +} while (0)

This is a really odd macro, I think. Is any of the callers really
depending on the traditional behavior? If so, can we change them to
avoid that? (I think the "else" can be more easily written with
ItemPointerCopy). In any case, I think the documentation of the macro
leaves a bit to be desired -- I don't think we really care all that much
what we used to do, except perhaps as a secondary comment, but we do
care very much about what it actually does, which the current comment
doesn't really explain.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-01-17 15:26:09 Re: ISO/IEC 9075-2:2016 for postgres community
Previous Message Dan Langille 2017-01-17 14:55:34 reminder: PGCon 2017 CFP