Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-03-23 10:14:22
Message-ID: CABOikdPEhYdsDqa07QR+x-TAeBsaNdSUNoosSzo8xTQ79W-UUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Mar 22, 2017 at 4:06 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> >
> >>
> >
> > Please find attached rebased patches.
> >
>
> Few comments on 0005_warm_updates_v18.patch:
>
>
Thanks a lot Amit for review comments.

> 1.
> @@ -806,20 +835,35 @@ hashbucketcleanup(Relation rel, Bucket
> cur_bucket, Buffer bucket_buf,
> {
> ..
> - if (callback && callback(htup, callback_state))
> + if(callback)
> {
> - kill_tuple = true;
> -
> if (tuples_removed)
> - *tuples_removed += 1;
> +result = callback(htup, is_warm, callback_state);
> + if (result== IBDCR_DELETE)
> + {
> + kill_tuple = true;
> + if (tuples_removed)
> +*tuples_removed += 1;
> + }
> + else if (result ==IBDCR_CLEAR_WARM)
> + {
> + clear_tuple= true;
> + }
> }
> else if
> (split_cleanup)
> ..
> }
>
> I think this will break the existing mechanism of split cleanup. We
> need to check for split cleanup if the tuple is tuple is not deletable
> by the callback. This is not merely an optimization but a must
> condition because we will clear the split cleanup flag after this
> bucket is scanned completely.
>
>
Ok, I see. Fixed, but please check if this looks good.

> 2.
> - PageIndexMultiDelete(page, deletable, ndeletable);
> + /*
> +
> * Clear the WARM pointers.
> + *
> + * We mustdo this before dealing with the dead items because
> + * PageIndexMultiDelete may move items around to compactify the
> + * array and hence offnums recorded earlierwon't make any sense
> + * after PageIndexMultiDelete is called.
> +
> */
> + if (nclearwarm > 0)
> + _hash_clear_items(page,clearwarm, nclearwarm);
> +
> + /*
> + * And delete the deletableitems
> + */
> + if (ndeletable > 0)
> +
> PageIndexMultiDelete(page, deletable, ndeletable);
>
> I think this assumes that the items where we need to clear warm flag
> are not deletable, otherwise what is the need to clear the flag if we
> are going to delete the tuple. The deletable tuple can have a warm
> flag if it is deletable due to split cleanup.
>
>
Yes. Since the callback will either say IBDCR_DELETE or IBDCR_CLEAR_WARM, I
don't think we will ever has a situation where a tuple is deleted as well
as cleared. I also checked that the bucket split code should carry the WARM
flag correctly to the new bucket.

Based on your first comment, I believe the rearranged code with take care
of deleting a tuple even if WARM flag is set, if the deletion is because of
bucket split.

3.
> + /*
> + * HASH indexes compute a hash value of the key and store that in the
> + * index. So
> we must first obtain the hash of the value obtained from the
> + * heap and then do a comparison
> +
> */
> + _hash_convert_tuple(indexRel, values, isnull, values2, isnull2);
>
> I think here, you need to handle the case where heap has a NULL value
> as the hash index doesn't contain NULL values, otherwise, the code in
> below function can return true which is not right.
>
>
I think we can simply conclude hashrecheck has failed the equality if the
heap has NULL value because such a tuple should not have been reached via
hash index unless a non-NULL hash key was later updated to a NULL key,
right?

> 4.
> +bool
> +hashrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple
> indexTuple,
> + Relation heapRel, HeapTuple heapTuple)
> {
> ..
> + att = indexRel->rd_att->attrs[i - 1];
> + if (!datumIsEqual(values2[i - 1], indxvalue, att->attbyval,
> + att->attlen))
> + {
> + equal = false;
> + break;
> + }
> ..
> }
>
> Hash values are always uint32 and attlen can be different for
> different datatypes, so I think above doesn't seem to be the right way
> to do the comparison.
>
>
Since we're referring to the attr from the index relation, wouldn't it tell
us the attribute specs of what gets stored in the index and not what's
there in the heap? I could be wrong but some quick tests show me that
pg_attribute->attlen for the index relation always returns 4 irrespective
of the underlying data type in heap.

> 5.
> @@ -274,6 +301,8 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
> OffsetNumber offnum;
>
> ItemPointer current;
> bool res;
> + IndexTuple itup;
> +
>
> /* Hash
> indexes are always lossy since we store only the hash code */
> scan->xs_recheck = true;
> @@ -316,8
> +345,6 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
> offnum <=
> maxoffnum;
> offnum = OffsetNumberNext(offnum))
> {
> -IndexTuple itup;
> -
>
> Why above change?
>
>
Seems spurious. Fixed.

>
> 6.
> + *stats = index_bulk_delete(&ivinfo, *stats,
> +lazy_indexvac_phase1, (void *) vacrelstats);
> + ereport(elevel,
> +(errmsg("scanned index \"%s\" to remove %d row version, found "
> +"%0.f warm pointers, %0.f clear pointers, removed "
> +"%0.f warm pointers, removed %0.f clear pointers",
> +RelationGetRelationName(indrel),
> + vacrelstats->num_dead_tuples,
> + (*stats)->num_warm_pointers,
> +(*stats)->num_clear_pointers,
> +(*stats)->warm_pointers_removed,
> + (*stats)->clear_pointers_removed)));
> +
> + (*stats)->num_warm_pointers = 0;
> + (*stats)->num_clear_pointers = 0;
> + (*stats)->warm_pointers_removed = 0;
> + (*stats)->clear_pointers_removed = 0;
> + (*stats)->pointers_cleared = 0;
> +
> + *stats =index_bulk_delete(&ivinfo, *stats,
> + lazy_indexvac_phase2, (void *)vacrelstats);
>
> To convert WARM chains, we need to do two index passes for all the
> indexes. I think it can substantially increase the random I/O. I
> think this can help us in doing more WARM updates, but I don't see how
> the downside of that (increased random I/O) will be acceptable for all
> kind of cases.
>
>
Yes, this is a very fair point. The way I proposed to address this upthread
is by introducing a set of threshold/scale GUCs specific to WARM. So users
can control when to invoke WARM cleanup. Only if the WARM cleanup is
required, we do 2 index scans. Otherwise vacuum will work the way it works
today without any additional overhead.

We already have some intelligence to skip the second index scan if we did
not find any WARM candidate chains during the first heap scan. This should
take care of majority of the users who never update their indexed columns.
For others, we need either a knob or some built-in way to deduce whether to
do WARM cleanup or not.

Does that seem worthwhile?

>
> +exists. Since index vacuum may visit these pointers in any order, we will
> need
> +another index pass to remove dead index pointers. So in the first index
> pass we
> +check which WARM candidates have 2 index pointers. In the second pass, we
> +remove the dead pointer and clear the INDEX_WARM_POINTER flag if that's
> the
> +surviving index pointer.
>
> I think there is some mismatch between README and code. In README, it
> is mentioned that dead pointers will be removed in the second phase,
> but I think the first phase code lazy_indexvac_phase1() will also
> allow to delete the dead pointers (it can return IBDCR_DELETE which
> will allow index am to remove dead items.). Am I missing something
> here?
>
>
Hmm.. fixed the README. Clearly we do allow removal of dead pointers which
are known to be certainly dead in the first index pass itself. Some other
pointers can be removed during the second scan once we know about the
existence or non existence of WARM index pointers.

>
> 7.
> + * For CLEAR chains, we just kill the WARM pointer, if it exist,s and keep
> + * the CLEAR pointer.
>
> typo (exist,s)
>
>
Fixed.

> 8.
> +/*
> + * lazy_indexvac_phase2() -- run first pass of index vacuum
>
> Shouldn't this be -- run the second pass
>
>
Yes, fixed.

> 9.
> - indexInfo); /* index AM may need this */
> +indexInfo, /* index AM may need this */
> +(modified_attrs != NULL)); /* type of uniqueness check to do */
>
> comment for the last parameter seems to be wrong.
>
>
Fixed.

> 10.
> +follow the update chain everytime to the end to see check if this is a
> WARM
> +chain.
>
> "see check" - seems one of those words is sufficient to explain the
> meaning.
>
>
Fixed.

> 11.
> +chain. This simplifies the design and addresses certain issues around
> +duplicate scans.
>
> "duplicate scans" - shouldn't be duplicate key scans.
>
>
Ok, seems better. Fixed.

> 12.
> +index on the table, irrespective of whether the key pertaining to the
> +index changed or not.
>
> typo.
> /index changed/index is changed
>
>
Fixed.

> 13.
> +For example, if we have a table with two columns and two indexes on each
> +of the column. When a tuple is first inserted the table, we have exactly
>
> typo.
> /inserted the table/inserted in the table
>
>
Fixed.

> 14.
> + lp [1] [2]
> + [1111, aaaa]->[111, bbbb]
>
> Here, after the update, the first column should be 1111.
>
>
Fixed.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0005_warm_updates_v19.patch application/octet-stream 216.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2017-03-23 10:24:34 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Thomas Munro 2017-03-23 10:13:58 Re: Measuring replay lag