Quick Links

Re: Patch: Write Amplification Reduction Method (WARM)

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Patch: Write Amplification Reduction Method (WARM)
Date:	2016-10-07 09:15:37
Message-ID:	6595e618-cbe4-2854-af0a-4a9992293bb7@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/06/2016 07:36 AM, Pavan Deolasee wrote:
>
>
> On Wed, Oct 5, 2016 at 1:43 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
...
> I can confirm the significant speedup, often by more than 75%
> (depending on number of indexes, whether the data set fits into RAM,
> etc.). Similarly for the amount of WAL generated, although that's a
> bit more difficult to evaluate due to full_page_writes.
>
> I'm not going to send detailed results, as that probably does not
> make much sense at this stage of the development - I can repeat the
> tests once the open questions get resolved.
>
>
> Sure. Anything that stands out? Any regression that you see? I'm not
> sure if your benchmarks exercise the paths which might show overheads
> without any tangible benefits. For example, I wonder if a test with many
> indexes where most of them get updated and then querying the table via
> those updated indexes could be one such test case.
>

No, nothing that would stand out. Let me explain what benchmark(s) I've
done. I've made some minor mistakes when running the benchmarks, so I
plan to rerun them and post the results after that. So let's take the
data with a grain of salt.

My goal was to compare current non-HOT behavior (updating all indexes)
with the WARM (updating only indexes on modified columns), and I've
taken two approaches:

1) fixed number of indexes, update variable number of columns

Create a table with 8 secondary indexes and then run a bunch of
benchmarks updating increasing number of columns. So the first run did

UPDATE t SET c1 = c1+1 WHERE id = :id;

while the second did

UPDATE t SET c1 = c1+1, c2 = c2+1 WHERE id = :id;

and so on, up to updating all the columns in the last run. I've used
multiple scripts to update all the columns / indexes uniformly
(essentially using multiple "-f" flags with pgbench). The runs were
fairly long (2h, enough to get stable behavior).

For a small data set (fits into RAM), the results look like this:

master patched diff
1 5994 8490 +42%
2 4347 7903 +81%
3 4340 7400 +70%
4 4324 6929 +60%
5 4256 6495 +52%
6 4253 5059 +19%
7 4235 4534 +7%
8 4194 4237 +1%

and the amount of WAL generated (after correction for tps difference)
looks like this (numbers are MBs)

master patched diff
1 27257 18508 -32%
2 21753 14599 -33%
3 21912 15864 -28%
4 22021 17135 -22%
5 21819 18258 -16%
6 21929 20659 -6%
7 21994 22234 +1%
8 21851 23267 +6%

So this is quite significant difference. I'm pretty sure the minor WAL
increase for the last two runs is due to full page writes (which also
affects the preceding runs, making the WAL reduction smaller than the
tps increase).

I do have results for larger data sets (>RAM), the results are very
similar although the speedup seems a bit smaller. But I need to rerun those.

2) single-row update, adding indexes between runs

This is kinda the opposite of the previous approach, i.e. transactions
always update a single column (multiple scripts to update the columns
uniformly), but there are new indexes added between runs. The results
(for a large data set, exceeding RAM) look like this:

master patched diff
0 954 1404 +47%
1 701 1045 +49%
2 484 816 +70%
3 346 683 +97%
4 248 608 +145%
5 190 525 +176%
6 152 397 +161%
7 123 315 +156%
8 123 270 +119%

So this looks really interesting.

>
> There's a lot of useful and important feedback in the thread(s) so
> far, particularly the descriptions of various failure cases. I think
> it'd be very useful to collect those examples and turn them into
> regression tests - that's something the patch should include anyway.
>
>
> Sure. I added only a handful test cases which I knew regression isn't
> covering. But I'll write more of them. One good thing is that the code
> gets heavily exercised even during regression. I caught and fixed
> multiple bugs running regression. I'm not saying that's enough, but it
> certainly gives some confidence.
>

I don't see any changes to src/test in the patch, so I'm not sure what
you mean when you say you added a handful of test cases?

>
>
> and update:
>
> update t set a = a+1, b=b+1;
>
> which has to update all indexes on the table, but:
>
> select n_tup_upd, n_tup_hot_upd from pg_stat_user_tables
>
> n_tup_upd | n_tup_hot_upd
> -----------+---------------
> 1000 | 1000
>
> So it's still counted as "WARM" - does it make sense?
>
>
> No, it does not. The code currently just marks any update as a WARM
> update if the table supports it and there is enough free space in the
> page. And yes, you're right. It's worth fixing that because of one-WARM
> update per chain limitation. Will fix.
>

Hmmm, so this makes monitoring of %WARM during benchmarks less reliable
than I hoped for :-(

>
> The way this is piggy-backed on the current HOT statistics seems a
> bit strange for another reason,
>
>
> Agree. We could add a similar n_tup_warm_upd counter.
>

Yes, although HOT is a special case of WARM. But it probably makes sense
to differentiate them, I guess.

>
> But WARM changes that - it allows adding index entries only to a
> subset of indexes, which means the "per row" n_tup_hot_upd counter
> is not sufficient. When you have a table with 10 indexes, and the
> counter increases by 1, does that mean the update added index tuple
> to 1 index or 9 of them?
>
>
> How about having counters similar to n_tup_ins/n_tup_del for indexes
> as well? Today it does not make sense because every index gets the
> same number of inserts, but WARM will change that.
>
> For example, we could have idx_tup_insert and idx_tup_delete that shows
> up in pg_stat_user_indexes. I don't know if idx_tup_delete adds any
> value, but one can then look at idx_tup_insert for various indexes to
> get a sense which indexes receives more inserts than others. The indexes
> which receive more inserts are the ones being frequently updated as
> compared to other indexes.
>

Hmmm, I'm not sure that'll work. I mean, those metrics would be useful
(although I can't think of a use case for idx_tup_delete), but I'm not
sure it's a enough to measure WARM. We need to compute

index_tuples_inserted / index_tuples_total

where (index_tuples_total - index_tuples_inserted) is the number of
index tuples we've been able to skip thanks to WARM. So we'd also need
to track the number of index tuples that we skipped for the index, and
I'm not sure that's a good idea.

Also, we really don't care about inserted tuples - what matters for WARM
are updates, so idx_tup_insert is either useless (because it also
includes non-UPDATE entries) or the naming is misleading.

> This also relates to vacuuming strategies. Today HOT updates do not
> count for triggering vacuum (or to be more precise, HOT pruned tuples
> are discounted while counting dead tuples). WARM tuples get the same
> treatment as far as pruning is concerned, but since they cause fresh
> index inserts, I wonder if we need some mechanism to cleanup the dead
> line pointers and dead index entries. This will become more important if
> we do something to convert WARM chains into HOT chains, something that
> only VACUUM can do in the design I've proposed so far.
>

True.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Patch: Write Amplification Reduction Method (WARM) at 2016-10-06 05:36:14 from Pavan Deolasee

Responses

Re: Patch: Write Amplification Reduction Method (WARM) at 2016-11-08 03:43:16 from Haribabu Kommi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2016-10-07 09:20:06	Re: Declarative partitioning - another take
Previous Message	Masahiko Sawada	2016-10-07 08:49:50	Re: VACUUM's ancillary tasks