Re: Patch: Write Amplification Reduction Method (WARM)

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-03-30 10:37:26
Message-ID: CABOikdMduu9wOhfvNzqVuNW4YdBgbgwv-A=HNFCL7R5Tmbx7JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> >>
> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >> wrote:
> >
> > Then during recheck, we pass already compressed values to
> > index_form_tuple(). But my point is, the following code will ensure that
> we
> > don't compress it again. My reading is that the first check for
> > !VARATT_IS_EXTENDED will return false if the value is already compressed.
> >
>
> You are right. I was confused with previous check of VARATT_IS_EXTERNAL.
>
>
Ok, thanks.

> >
> > TBH I couldn't find why the original index insertion code will always
> supply
> > uncompressed values.
> >
>
> Just try by inserting large value of text column ('aaaaaa.....bbb')
> upto 2.5K. Then have a breakpoint in heap_prepare_insert and
> index_form_tuple, and debug both the functions, you can find out that
> even though we compress during insertion in heap, the index will
> compress the original value again.
>
>
Ok, tried that. AFAICS index_form_tuple gets compressed values.

>
> Yeah probably you are right, but I am not sure if it is good idea to
> compare compressed values.
>
>
Again, I don't see a problem there.

> I think with this new changes in btrecheck, it would appear to be much
> costlier as compare to what you have few versions back. I am afraid
> that it can impact performance for cases where there are few WARM
> updates in chain and many HOT updates as it will run recheck for all
> such updates.

My feeling is that the recheck could be costly for very fat indexes, but
not doing WARM could be costly too for such indexes. We can possibly
construct a worst case where
1. set up a table with a fat index.
2. do a WARM update to a tuple
3. then do several HOT updates to the same tuple
4. query the row via the fat index.

Initialisation:

-- Adjust parameters to force index scans
-- enable_seqscan to false
-- seq_page_cost = 10000

DROP TABLE IF EXISTS pgbench_accounts;

CREATE TABLE pgbench_accounts (
aid text,
bid bigint,
abalance bigint,
filler1 text DEFAULT md5(random()::text),
filler2 text DEFAULT md5(random()::text),
filler3 text DEFAULT md5(random()::text),
filler4 text DEFAULT md5(random()::text),
filler5 text DEFAULT md5(random()::text),
filler6 text DEFAULT md5(random()::text),
filler7 text DEFAULT md5(random()::text),
filler8 text DEFAULT md5(random()::text),
filler9 text DEFAULT md5(random()::text),
filler10 text DEFAULT md5(random()::text),
filler11 text DEFAULT md5(random()::text),
filler12 text DEFAULT md5(random()::text)
) WITH (fillfactor=90);
\set end 0
\set start (:end + 1)
\set end (:start + (:scale * 100))

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
<2300 chars string>, (random()::bigint) % :scale, 0;

CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);

-- Force a WARM update on one row
UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
repeat('abcdefghij', 20000);

Test:
-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
<2300 chars string> ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
<2300 chars string>;
END;

I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
regression.

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns idx_scan
414117 828233
411109 822217
411848 823695
408424 816847

WARM:
txns idx_scan
404139 808277
398880 797759
399949 799897
397927 795853

==========

I then also repeated the tests, but this time using compressible values.
The regression in this case is much higher, may be 15% or more.

INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
repeat('abcdefghij', 20000), (random()::bigint) % :scale, 0;

-- Fetch the row using the fat index. Since the row contains a
BEGIN;
SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
repeat('abcdefghij', 20000) ORDER BY aid;
UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
repeat('abcdefghij', 20000);
END;

(Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
scans on the fat index)
master:
txns idx_scan
56976 113953
56822 113645
56915 113831
56865 113731

WARM:
txns idx_scan
49044 98087
49020 98039
49007 98013
49006 98011

But TBH I believe this regression is coming from the changes
to heap_tuple_attr_equals where we are decompressing both old and new
values and then comparing them. For 200K bytes long values, that must be
something. Another reason why I think so is because I accidentally did one
run which did not use index scans and did not perform any WARM updates, but
the regression was kinda similar. So that makes me think that the
regression is coming from somewhere else and change in
heap_tuple_attr_equals seems like a good candidate.

I think we can fix that by comparing compressed values. I know you had
raised concerns, but Robert confirmed that (IIUC) it's not a problem today.
We will figure out how to deal with it if we ever add support for different
compression algorithms or compression levels. And I also think this is
kinda synthetic use case and the fact that there is not much regression
with indexes as large as 2K bytes seems quite comforting to me.

===========

Apart from this, I also ran some benchmarks by removing index on the
abalance column in my test suite so that all updates are HOT updates. I did
not find any regression in that scenario. WARM was a percentage or more
better, but I assume that's just noise. These benchmarks were done on scale
factor 100, running for 1hr each. Headline numbers are:

WARM: 5802 txns/sec
master: 5719 txns/sec.

===========

Another workload where WARM could cause regression is where there are many
indexes on a table and UPDATEs update all but one indexes. We will do WARM
update in this case but since N-1 indexes will anyways get a new index
entry, benefits of WARM will be marginal. There will be increased cost of
AV because we will scan N-1 indexes for cleanup.

While this could be an atypical workload, its probably worth to guard
against this. I propose that we stop WARM at the source if we detect that
more than certain percentage of indexes will be updated by an UPDATE
statement. Of course, we can be more fancy and look at each index structure
and arrive at a cost model. But a simple 50% rule seems a good starting
point. So if an UPDATE is going to modify more than 50% indexes, do a
non-WARM update. Attached patch adds that support.

I ran tests by modifying the benchmark used for previous tests by adding
abalance column to all indexes except one on aid. With the patch applied,
there are zero WARM updates on the table (as expected). The headline
numbers are:

master: 4101 txns/sec
WARM: 4033 txns/sec

So probably within acceptable range.

============

Finally, I tested another workload where we have total 6 indexes and 3 of
them are modified by each UPDATE and 3 are not. Ran it with scale factor
100 for 1hr each. The headline numbers:

master: 3679 txns/sec (I don't see a reason why master should worse
compared to 5 index update case, so probably needs more runs to check
aberration)
WARM: 4050 txns/sec (not much difference from no WARM update case, but
since master degenerated, probably worth doing another round.. I am using
AWS instance and it's not first time I am seeing aberrations).

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0008_disable_warm_on_manyindex_update.patch application/octet-stream 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-03-30 10:37:29 Re: Partitioned tables and relfilenode
Previous Message Amit Langote 2017-03-30 10:32:21 Re: Partition-wise join for join between (declaratively) partitioned tables