Re: Patch: Write Amplification Reduction Method (WARM)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-03-22 10:36:28
Message-ID: CAA4eK1+2vvhzVveagojZDgwy96FBoTOCAaWBF-O=WUX_qZJzCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 21, 2017 at 6:47 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>>
>
> Please find attached rebased patches.
>

Few comments on 0005_warm_updates_v18.patch:

1.
@@ -806,20 +835,35 @@ hashbucketcleanup(Relation rel, Bucket
cur_bucket, Buffer bucket_buf,
{
..
- if (callback && callback(htup, callback_state))
+ if(callback)
{
- kill_tuple = true;
-
if (tuples_removed)
- *tuples_removed += 1;
+result = callback(htup, is_warm, callback_state);
+ if (result== IBDCR_DELETE)
+ {
+ kill_tuple = true;
+ if (tuples_removed)
+*tuples_removed += 1;
+ }
+ else if (result ==IBDCR_CLEAR_WARM)
+ {
+ clear_tuple= true;
+ }
}
else if
(split_cleanup)
..
}

I think this will break the existing mechanism of split cleanup. We
need to check for split cleanup if the tuple is tuple is not deletable
by the callback. This is not merely an optimization but a must
condition because we will clear the split cleanup flag after this
bucket is scanned completely.

2.
- PageIndexMultiDelete(page, deletable, ndeletable);
+ /*
+
* Clear the WARM pointers.
+ *
+ * We mustdo this before dealing with the dead items because
+ * PageIndexMultiDelete may move items around to compactify the
+ * array and hence offnums recorded earlierwon't make any sense
+ * after PageIndexMultiDelete is called.
+
*/
+ if (nclearwarm > 0)
+ _hash_clear_items(page,clearwarm, nclearwarm);
+
+ /*
+ * And delete the deletableitems
+ */
+ if (ndeletable > 0)
+
PageIndexMultiDelete(page, deletable, ndeletable);

I think this assumes that the items where we need to clear warm flag
are not deletable, otherwise what is the need to clear the flag if we
are going to delete the tuple. The deletable tuple can have a warm
flag if it is deletable due to split cleanup.

3.
+ /*
+ * HASH indexes compute a hash value of the key and store that in the
+ * index. So
we must first obtain the hash of the value obtained from the
+ * heap and then do a comparison
+
*/
+ _hash_convert_tuple(indexRel, values, isnull, values2, isnull2);

I think here, you need to handle the case where heap has a NULL value
as the hash index doesn't contain NULL values, otherwise, the code in
below function can return true which is not right.

4.
+bool
+hashrecheck(Relation indexRel, IndexInfo *indexInfo, IndexTuple indexTuple,
+ Relation heapRel, HeapTuple heapTuple)
{
..
+ att = indexRel->rd_att->attrs[i - 1];
+ if (!datumIsEqual(values2[i - 1], indxvalue, att->attbyval,
+ att->attlen))
+ {
+ equal = false;
+ break;
+ }
..
}

Hash values are always uint32 and attlen can be different for
different datatypes, so I think above doesn't seem to be the right way
to do the comparison.

5.
@@ -274,6 +301,8 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
OffsetNumber offnum;

ItemPointer current;
bool res;
+ IndexTuple itup;
+

/* Hash
indexes are always lossy since we store only the hash code */
scan->xs_recheck = true;
@@ -316,8
+345,6 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
offnum <=
maxoffnum;
offnum = OffsetNumberNext(offnum))
{
-IndexTuple itup;
-

Why above change?

6.
+ *stats = index_bulk_delete(&ivinfo, *stats,
+lazy_indexvac_phase1, (void *) vacrelstats);
+ ereport(elevel,
+(errmsg("scanned index \"%s\" to remove %d row version, found "
+"%0.f warm pointers, %0.f clear pointers, removed "
+"%0.f warm pointers, removed %0.f clear pointers",
+RelationGetRelationName(indrel),
+ vacrelstats->num_dead_tuples,
+ (*stats)->num_warm_pointers,
+(*stats)->num_clear_pointers,
+(*stats)->warm_pointers_removed,
+ (*stats)->clear_pointers_removed)));
+
+ (*stats)->num_warm_pointers = 0;
+ (*stats)->num_clear_pointers = 0;
+ (*stats)->warm_pointers_removed = 0;
+ (*stats)->clear_pointers_removed = 0;
+ (*stats)->pointers_cleared = 0;
+
+ *stats =index_bulk_delete(&ivinfo, *stats,
+ lazy_indexvac_phase2, (void *)vacrelstats);

To convert WARM chains, we need to do two index passes for all the
indexes. I think it can substantially increase the random I/O. I
think this can help us in doing more WARM updates, but I don't see how
the downside of that (increased random I/O) will be acceptable for all
kind of cases.

+exists. Since index vacuum may visit these pointers in any order, we will need
+another index pass to remove dead index pointers. So in the first index pass we
+check which WARM candidates have 2 index pointers. In the second pass, we
+remove the dead pointer and clear the INDEX_WARM_POINTER flag if that's the
+surviving index pointer.

I think there is some mismatch between README and code. In README, it
is mentioned that dead pointers will be removed in the second phase,
but I think the first phase code lazy_indexvac_phase1() will also
allow to delete the dead pointers (it can return IBDCR_DELETE which
will allow index am to remove dead items.). Am I missing something
here?

7.
+ * For CLEAR chains, we just kill the WARM pointer, if it exist,s and keep
+ * the CLEAR pointer.

typo (exist,s)

8.
+/*
+ * lazy_indexvac_phase2() -- run first pass of index vacuum

Shouldn't this be -- run the second pass

9.
- indexInfo); /* index AM may need this */
+indexInfo, /* index AM may need this */
+(modified_attrs != NULL)); /* type of uniqueness check to do */

comment for the last parameter seems to be wrong.

10.
+follow the update chain everytime to the end to see check if this is a WARM
+chain.

"see check" - seems one of those words is sufficient to explain the meaning.

11.
+chain. This simplifies the design and addresses certain issues around
+duplicate scans.

"duplicate scans" - shouldn't be duplicate key scans.

12.
+index on the table, irrespective of whether the key pertaining to the
+index changed or not.

typo.
/index changed/index is changed

13.
+For example, if we have a table with two columns and two indexes on each
+of the column. When a tuple is first inserted the table, we have exactly

typo.
/inserted the table/inserted in the table

14.
+ lp [1] [2]
+ [1111, aaaa]->[111, bbbb]

Here, after the update, the first column should be 1111.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-03-22 10:48:48 Re: [COMMITTERS] pgsql: Fix and simplify check for whether we're running as Windows serv
Previous Message Etsuro Fujita 2017-03-22 10:20:52 Re: Push down more UPDATEs/DELETEs in postgres_fdw