Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2016-10-06 05:36:14
Message-ID: CABOikdM1euaBr8_Lt2F+=WdeK1GTFKQ9EDgqEE1KUMmce_Co1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 5, 2016 at 1:43 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
> I've been looking at the patch over the past few days, running a bunch of
> benchmarks etc.

Thanks for doing that.

> I can confirm the significant speedup, often by more than 75% (depending
> on number of indexes, whether the data set fits into RAM, etc.). Similarly
> for the amount of WAL generated, although that's a bit more difficult to
> evaluate due to full_page_writes.
>
> I'm not going to send detailed results, as that probably does not make
> much sense at this stage of the development - I can repeat the tests once
> the open questions get resolved.
>

Sure. Anything that stands out? Any regression that you see? I'm not sure
if your benchmarks exercise the paths which might show overheads without
any tangible benefits. For example, I wonder if a test with many indexes
where most of them get updated and then querying the table via those
updated indexes could be one such test case.

>
> There's a lot of useful and important feedback in the thread(s) so far,
> particularly the descriptions of various failure cases. I think it'd be
> very useful to collect those examples and turn them into regression tests -
> that's something the patch should include anyway.
>

Sure. I added only a handful test cases which I knew regression isn't
covering. But I'll write more of them. One good thing is that the code gets
heavily exercised even during regression. I caught and fixed multiple bugs
running regression. I'm not saying that's enough, but it certainly gives
some confidence.

>
> and update:
>
> update t set a = a+1, b=b+1;
>
> which has to update all indexes on the table, but:
>
> select n_tup_upd, n_tup_hot_upd from pg_stat_user_tables
>
> n_tup_upd | n_tup_hot_upd
> -----------+---------------
> 1000 | 1000
>
> So it's still counted as "WARM" - does it make sense?

No, it does not. The code currently just marks any update as a WARM update
if the table supports it and there is enough free space in the page. And
yes, you're right. It's worth fixing that because of one-WARM update per
chain limitation. Will fix.

>
> The way this is piggy-backed on the current HOT statistics seems a bit
> strange for another reason,

Agree. We could add a similar n_tup_warm_upd counter.

> But WARM changes that - it allows adding index entries only to a subset of
> indexes, which means the "per row" n_tup_hot_upd counter is not sufficient.
> When you have a table with 10 indexes, and the counter increases by 1, does
> that mean the update added index tuple to 1 index or 9 of them?
>

How about having counters similar to n_tup_ins/n_tup_del for indexes as
well? Today it does not make sense because every index gets the same number
of inserts, but WARM will change that.

For example, we could have idx_tup_insert and idx_tup_delete that shows up
in pg_stat_user_indexes. I don't know if idx_tup_delete adds any value, but
one can then look at idx_tup_insert for various indexes to get a sense
which indexes receives more inserts than others. The indexes which receive
more inserts are the ones being frequently updated as compared to other
indexes.

This also relates to vacuuming strategies. Today HOT updates do not count
for triggering vacuum (or to be more precise, HOT pruned tuples are
discounted while counting dead tuples). WARM tuples get the same treatment
as far as pruning is concerned, but since they cause fresh index inserts, I
wonder if we need some mechanism to cleanup the dead line pointers and dead
index entries. This will become more important if we do something to
convert WARM chains into HOT chains, something that only VACUUM can do in
the design I've proposed so far.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Franck Verrot 2016-10-06 05:58:15 Re: Mention column name in error messages
Previous Message Tomas Vondra 2016-10-06 04:50:41 memory leak in e94568ecc10 (pre-reading in external sort)