Re: Patch: Write Amplification Reduction Method (WARM)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2016-08-31 20:03:29
Message-ID: 20160831200329.GC31352@momjian.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Aug 31, 2016 at 10:15:33PM +0530, Pavan Deolasee wrote:
> Instead, what I would like to propose and the patch currently implements is to
> restrict WARM update to once per chain. So the first non-HOT update to a tuple
> or a HOT chain can be a WARM update. The chain can further be HOT updated any
> number of times. But it can no further be WARM updated. This might look too
> restrictive, but it can still bring down the number of regular updates by
> almost 50%. Further, if we devise a strategy to convert a WARM chain back to
> HOT chain, it can again be WARM updated. (This part is currently not
> implemented). A good side effect of this simple strategy is that we know there
> can maximum two index entries pointing to any given WARM chain.

I like the simplified approach, as long as it doesn't block further
improvements.

> Headline TPS numbers:
>
> Master:
>
> transaction type: update.sql
> scaling factor: 700
> query mode: simple
> number of clients: 16
> number of threads: 8
> duration: 57600 s
> number of transactions actually processed: 65552986
> latency average: 14.059 ms
> tps = 1138.072117 (including connections establishing)
> tps = 1138.072156 (excluding connections establishing)
>
>
> WARM:
>
> transaction type: update.sql
> scaling factor: 700
> query mode: simple
> number of clients: 16
> number of threads: 8
> duration: 57600 s
> number of transactions actually processed: 116168454
> latency average: 7.933 ms
> tps = 2016.812924 (including connections establishing)
> tps = 2016.812997 (excluding connections establishing)

These are very impressive results.

> Converting WARM chains back to HOT chains (VACUUM ?)
> ---------------------------------------------------------------------------------
>
> The current implementation of WARM allows only one WARM update per chain. This
> simplifies the design and addresses certain issues around duplicate scans. But
> this also implies that the benefit of WARM will be no more than 50%, which is
> still significant, but if we could return WARM chains back to normal status, we
> could do far more WARM updates.
>
> A distinct property of a WARM chain is that at least one index has more than
> one live index entries pointing to the root of the chain. In other words, if we
> can remove duplicate entry from every index or conclusively prove that there
> are no duplicate index entries for the root line pointer, the chain can again
> be marked as HOT.

I had not thought of how to convert from WARM to HOT yet.

> Here is one idea, but more thoughts/suggestions are most welcome. 
>
> A WARM chain has two parts, separated by the tuple that caused WARM update. All
> tuples in each part has matching index keys, but certain index keys may not
> match between these two parts. Lets say we mark heap tuples in each part with a
> special Red-Blue flag. The same flag is replicated in the index tuples. For
> example, when new rows are inserted in a table, they are marked with Blue flag
> and the index entries associated with those rows are also marked with Blue
> flag. When a row is WARM updated, the new version is marked with Red flag and
> the new index entry created by the update is also marked with Red flag.
>
>
> Heap chain: lp  [1] [2] [3] [4]
>   [aaaa, 1111]B -> [aaaa, 1111]B -> [bbbb, 1111]R -> [bbbb, 1111]R
>
> Index1: (aaaa)B points to 1 (satisfies only tuples marked with B)
> (bbbb)R points to 1 (satisfies only tuples marked with R)
>
> Index2: (1111)B points to 1 (satisfies both B and R tuples)
>
>
> It's clear that for indexes with Red and Blue pointers, a heap tuple with Blue
> flag will be reachable from Blue pointer and that with Red flag will be
> reachable from Red pointer. But for indexes which did not create a new entry,
> both Blue and Red tuples will be reachable from Blue pointer (there is no Red
> pointer in such indexes). So, as a side note, matching Red and Blue flags is
> not enough from index scan perspective.
>
> During first heap scan of VACUUM, we look for tuples with HEAP_WARM_TUPLE set.
> If all live tuples in the chain are either marked with Blue flag or Red flag
> (but no mix of Red and Blue), then the chain is a candidate for HOT conversion.

Uh, if the chain is all blue, then there is are WARM entries so it
already a HOT chain, so there is nothing to do, right?

> We remember the root line pointer and Red-Blue flag of the WARM chain in a
> separate array.
>
> If we have a Red WARM chain, then our goal is to remove Blue pointers and vice
> versa. But there is a catch. For Index2 above, there is only Blue pointer
> and that must not be removed. IOW we should remove Blue pointer iff a Red
> pointer exists. Since index vacuum may visit Red and Blue pointers in any
> order, I think we will need another index pass to remove dead
> index pointers. So in the first index pass we check which WARM candidates have
> 2 index pointers. In the second pass, we remove the dead pointer and reset Red
> flag is the surviving index pointer is Red.

Why not just remember the tid of chains converted from WARM to HOT, then
use "amrecheck" on an index entry matching that tid to see if the index
matches one of the entries in the chain. (It will match all of them or
none of them, because they are all red.) I don't see a point in
coloring the index entries as reds as later you would have to convert to
blue in the WARM-to-HOT conversion, and a vacuum crash could lead to
inconsistencies. Consider that you can just call "amrecheck" on the few
chains that have converted from WARM to HOT. I believe this is more
crash-safe too. However, if you have converted WARM to HOT in the heap,
but crash durin the index entry removal, you could potentially have
duplicates in the index later, which is bad.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-08-31 20:11:38 Re: Patch: Write Amplification Reduction Method (WARM)
Previous Message Vik Fearing 2016-08-31 19:20:43 Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)