Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2016-09-01 09:07:40
Message-ID: CABOikdN7yozYYPoYaJcz8=p50qHeOJZdJkhOR9KvQkh7vBW4VA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 1, 2016 at 1:33 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Wed, Aug 31, 2016 at 10:15:33PM +0530, Pavan Deolasee wrote:
> > Instead, what I would like to propose and the patch currently implements
> is to
> > restrict WARM update to once per chain. So the first non-HOT update to a
> tuple
> > or a HOT chain can be a WARM update. The chain can further be HOT
> updated any
> > number of times. But it can no further be WARM updated. This might look
> too
> > restrictive, but it can still bring down the number of regular updates by
> > almost 50%. Further, if we devise a strategy to convert a WARM chain
> back to
> > HOT chain, it can again be WARM updated. (This part is currently not
> > implemented). A good side effect of this simple strategy is that we know
> there
> > can maximum two index entries pointing to any given WARM chain.
>
> I like the simplified approach, as long as it doesn't block further
> improvements.
>
>
Yes, the proposed approach is simple yet does not stop us from improving
things further. Moreover it has shown good performance characteristics and
I believe it's a good first step.

>
> > Master:
> > tps = 1138.072117 (including connections establishing)
> >
> > WARM:
> > tps = 2016.812924 (including connections establishing)
>
> These are very impressive results.
>
>
Thanks. What's also interesting and something that headline numbers don't
show is that WARM TPS is as much as 3 times of master TPS when the
percentage of WARM updates is very high. Notice the spike in TPS in the
comparison graph.

Results with non-default heap fill factor are even better. In both cases,
the improvement in TPS stays constant over long periods.

>
> >
> > During first heap scan of VACUUM, we look for tuples with
> HEAP_WARM_TUPLE set.
> > If all live tuples in the chain are either marked with Blue flag or Red
> flag
> > (but no mix of Red and Blue), then the chain is a candidate for HOT
> conversion.
>
> Uh, if the chain is all blue, then there is are WARM entries so it
> already a HOT chain, so there is nothing to do, right?
>

For aborted WARM updates, the heap chain may be all blue, but there may
still be a red index pointer which must be cleared before we allow further
WARM updates to the chain.

>
> > We remember the root line pointer and Red-Blue flag of the WARM chain in
> a
> > separate array.
> >
> > If we have a Red WARM chain, then our goal is to remove Blue pointers
> and vice
> > versa. But there is a catch. For Index2 above, there is only Blue pointer
> > and that must not be removed. IOW we should remove Blue pointer iff a Red
> > pointer exists. Since index vacuum may visit Red and Blue pointers in any
> > order, I think we will need another index pass to remove dead
> > index pointers. So in the first index pass we check which WARM
> candidates have
> > 2 index pointers. In the second pass, we remove the dead pointer and
> reset Red
> > flag is the surviving index pointer is Red.
>
> Why not just remember the tid of chains converted from WARM to HOT, then
> use "amrecheck" on an index entry matching that tid to see if the index
> matches one of the entries in the chain.

That will require random access to heap during index vacuum phase,
something I would like to avoid. But we can have that as a fall back
solution for handling aborted vacuums.

> (It will match all of them or
> none of them, because they are all red.) I don't see a point in
> coloring the index entries as reds as later you would have to convert to
> blue in the WARM-to-HOT conversion, and a vacuum crash could lead to
> inconsistencies.

Yes, that's a concern since the conversion of red to blue will also need to
WAL logged to ensure that a crash doesn't leave us in inconsistent state. I
still think that this will be an overall improvement as compared to
allowing one WARM update per chain.

> Consider that you can just call "amrecheck" on the few
> chains that have converted from WARM to HOT. I believe this is more
> crash-safe too. However, if you have converted WARM to HOT in the heap,
> but crash durin the index entry removal, you could potentially have
> duplicates in the index later, which is bad.
>
>
As you probably already noted, we clear heap flags only after all indexes
are cleared of duplicate entries and hence a crash in between should not
cause any correctness issue. As long as heap tuples are marked as warm,
amrecheck will ensure that only valid tuples are returned to the caller.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2016-09-01 09:08:25 Re: Add support for restrictive RLS policies
Previous Message Simon Riggs 2016-09-01 09:05:13 Re: pg_basebackup wish list