Re: Patch: Write Amplification Reduction Method (WARM)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Date: 2017-03-21 13:25:49
Message-ID: CA+TgmobKTAN1m8qiZJ+w6L6Kw9CfHrSevhwnrfVPN3KxA6=4KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 21, 2017 at 8:41 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> Yeah. So what's the deal with this? Is somebody working on figuring
>> out a different approach that would reduce this overhead? Are we
>> going to defer WARM to v11? Or is the intent to just ignore the 5-10%
>> slowdown on a single column update and commit everything anyway?
>
> I think I should clarify something. The test case does a single column
> update, but it also has columns which are very wide, has an index on many
> columns (and it updates a column early in the list). In addition, in the
> test Mithun updated all 10million rows of the table in a single transaction,
> used UNLOGGED table and fsync was turned off.
>
> TBH I see many artificial scenarios here. It will be very useful if he can
> rerun the query with some of these restrictions lifted. I'm all for
> addressing whatever we can, but I am not sure if this test demonstrates a
> real world usage.

That's a very fair point, but if these patches - or some of them - are
going to get committed then these things need to get discussed. Let's
not just have nothing-nothing-nothing giant unagreed code drop.

I think that very wide columns and highly indexed tables are not
particularly unrealistic, nor do I think updating all the rows is
particularly unrealistic. Sure, it's not everything, but it's
something. Now, I would agree that all of that PLUS unlogged tables
with fsync=off is not too realistic. What kind of regression would we
observe if we eliminated those last two variables?

> Having said that, may be if we can do a few things to reduce the overhead.
>
> - Check if the page has enough free space to perform a HOT/WARM update. If
> not, don't look for all index keys.
> - Pass bitmaps separately for each index and bail out early if we conclude
> neither HOT nor WARM is possible. In this case since there is just one index
> and as soon as we check the second column we know neither HOT nor WARM is
> possible, we will return early. It might complicate the API a lot, but I can
> give it a shot if that's what is needed to make progress.

I think that whether the code ends up getting contorted is an
important consideration here. For example, if the first of the things
you mention can be done without making the code ugly, then I think
that would be worth doing; it's likely to help fairly often in
real-world cases. The problem with making the code contorted and
ugly, as you say that the second idea would require, is that it can
easily mask bugs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2017-03-21 13:26:43 Re: [PATCH]: fix bug in SP-GiST box_ops
Previous Message Pavan Deolasee 2017-03-21 13:17:18 Re: Patch: Write Amplification Reduction Method (WARM)