Quick Links

Re: Patch: Write Amplification Reduction Method (WARM)

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Patch: Write Amplification Reduction Method (WARM)
Date:	2017-03-30 11:57:00
Message-ID:	CAA4eK1+15WBix_TmXX4FNBSpvWz2A_gp=Kn=P_4w79mOR65CGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Mar 30, 2017 at 4:07 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
> On Wed, Mar 29, 2017 at 4:42 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>> On Wed, Mar 29, 2017 at 1:10 PM, Pavan Deolasee
>> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> >
>> > On Wed, Mar 29, 2017 at 12:02 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Wed, Mar 29, 2017 at 11:52 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> >> wrote:
>> >
>> > Then during recheck, we pass already compressed values to
>> > index_form_tuple(). But my point is, the following code will ensure that
>> > we
>> > don't compress it again. My reading is that the first check for
>> > !VARATT_IS_EXTENDED will return false if the value is already
>> > compressed.
>> >
>>
>> You are right. I was confused with previous check of VARATT_IS_EXTERNAL.
>>
>
> Ok, thanks.
>
>>
>> >
>> > TBH I couldn't find why the original index insertion code will always
>> > supply
>> > uncompressed values.
>> >
>>
>> Just try by inserting large value of text column ('aaaaaa.....bbb')
>> upto 2.5K. Then have a breakpoint in heap_prepare_insert and
>> index_form_tuple, and debug both the functions, you can find out that
>> even though we compress during insertion in heap, the index will
>> compress the original value again.
>>
>
> Ok, tried that. AFAICS index_form_tuple gets compressed values.
>

How have you verified that? Have you checked that in
heap_prepare_insert it has called toast_insert_or_update() and then
returned a tuple different from what the input tup is? Basically, I
am easily able to see it and even the reason why the heap and index
tuples will be different. Let me try to explain,
toast_insert_or_update returns a new tuple which contains compressed
data and this tuple is inserted in heap where as slot still refers to
original tuple (uncompressed one) which is passed to heap_insert.
Now, ExecInsertIndexTuples and the calls under it like FormIndexDatum
will refer to the tuple in slot which is uncompressed and form the
values[] using uncompressed value. Try with a simple case as below:

Create table t_comp(c1 int, c2 text);
Create index idx_t_comp_c2 on t_comp(c2);
Create index idx_t_comp_c1 on t_comp(c1);

Insert into t_comp(1,'aaaa ...aaa');

Repeat 'a' in above line for 2700 times or so. You should notice what
I am explaining above.

>>
>>
>> Yeah probably you are right, but I am not sure if it is good idea to
>> compare compressed values.
>>
>
> Again, I don't see a problem there.
>
>>
>> I think with this new changes in btrecheck, it would appear to be much
>> costlier as compare to what you have few versions back. I am afraid
>> that it can impact performance for cases where there are few WARM
>> updates in chain and many HOT updates as it will run recheck for all
>> such updates.
>
>
>
> INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
> <2300 chars string>, (random()::bigint) % :scale, 0;
>
> CREATE UNIQUE INDEX pgb_a_aid ON pgbench_accounts(aid);
> CREATE INDEX pgb_a_filler1 ON pgbench_accounts(filler1);
> CREATE INDEX pgb_a_filler2 ON pgbench_accounts(filler2);
> CREATE INDEX pgb_a_filler3 ON pgbench_accounts(filler3);
> CREATE INDEX pgb_a_filler4 ON pgbench_accounts(filler4);
>
> -- Force a WARM update on one row
> UPDATE pgbench_accounts SET filler1 = 'X' WHERE aid = '100' ||
> repeat('abcdefghij', 20000);
>
> Test:
> -- Fetch the row using the fat index. Since the row contains a
> BEGIN;
> SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
> <2300 chars string> ORDER BY aid;
> UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
> <2300 chars string>;
> END;
>
> I did 4 5-minutes runs with master and WARM and there is probably a 2-3%
> regression.
>

So IIUC, in above test during initialization you have one WARM update
and then during actual test all are HOT updates, won't in such a case
the WARM chain will be converted to HOT by vacuum and then all updates
from thereon will be HOT and probably no rechecks?

> (Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
> scans on the fat index)
> master:
> txns idx_scan
> 414117 828233
> 411109 822217
> 411848 823695
> 408424 816847
>
> WARM:
> txns idx_scan
> 404139 808277
> 398880 797759
> 399949 799897
> 397927 795853
>
> ==========
>
> I then also repeated the tests, but this time using compressible values. The
> regression in this case is much higher, may be 15% or more.
>

Sounds on higher side.

> INSERT INTO pgbench_accounts SELECT generate_series(:start, :end )::text ||
> repeat('abcdefghij', 20000), (random()::bigint) % :scale, 0;
>
> -- Fetch the row using the fat index. Since the row contains a
> BEGIN;
> SELECT substring(aid, 1, 10) FROM pgbench_accounts WHERE aid = '100' ||
> repeat('abcdefghij', 20000) ORDER BY aid;
> UPDATE pgbench_accounts SET abalance = abalance + 100 WHERE aid = '100' ||
> repeat('abcdefghij', 20000);
> END;
>
> (Results with 5 mins tests, txns is total for 5 mins, idx_scan is number of
> scans on the fat index)
> master:
> txns idx_scan
> 56976 113953
> 56822 113645
> 56915 113831
> 56865 113731
>
> WARM:
> txns idx_scan
> 49044 98087
> 49020 98039
> 49007 98013
> 49006 98011
>
> But TBH I believe this regression is coming from the changes to
> heap_tuple_attr_equals where we are decompressing both old and new values
> and then comparing them. For 200K bytes long values, that must be something.
> Another reason why I think so is because I accidentally did one run which
> did not use index scans and did not perform any WARM updates, but the
> regression was kinda similar. So that makes me think that the regression is
> coming from somewhere else and change in heap_tuple_attr_equals seems like a
> good candidate.
>
> I think we can fix that by comparing compressed values.
>

IIUC, by the time you are comparing tuple attrs to check for modified
columns, you don't have the compressed values for new tuple.

> I know you had
> raised concerns, but Robert confirmed that (IIUC) it's not a problem today.
>

Yeah, but I am not sure if we can take Robert's statement as some sort
of endorsement for what the patch does.

> We will figure out how to deal with it if we ever add support for different
> compression algorithms or compression levels. And I also think this is kinda
> synthetic use case and the fact that there is not much regression with
> indexes as large as 2K bytes seems quite comforting to me.
>

I am not sure if we can consider it as completely synthetic because we
might see some similar cases for json datatypes. Can we once try to
see the impact when the same test runs from multiple clients? For
your information, I am also trying to setup some tests along with one
of my colleague and we will report the results once the tests are
complete.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Patch: Write Amplification Reduction Method (WARM) at 2017-03-30 10:37:26 from Pavan Deolasee

Responses

Re: Patch: Write Amplification Reduction Method (WARM) at 2017-03-30 12:25:49 from Pavan Deolasee
Re: Patch: Write Amplification Reduction Method (WARM) at 2017-03-31 07:01:31 from Dilip Kumar

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Langote	2017-03-30 11:58:35	Re: Partitioned tables and relfilenode
Previous Message	Vitaly Burovoy	2017-03-30 11:46:24	Re: sequence data type