Re: making update/delete of inheritance trees scale better

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Amit Langote <amitlangote09(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making update/delete of inheritance trees scale better
Date: 2020-10-30 20:35:48
Message-ID: 2e50d782-36f9-e723-0c4b-d133e63c6127@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29/10/2020 15:03, Amit Langote wrote:
> On Sun, Oct 4, 2020 at 11:44 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> On Fri, Sep 11, 2020 at 7:20 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>>> Here are the commit messages of the attached patches:
>>>
>>> [PATCH v3 1/3] Overhaul how updates compute a new tuple
>>
>> I tried to assess the performance impact of this rejiggering of how
>> updates are performed. As to why one may think there may be a
>> negative impact, consider that ExecModifyTable() now has to perform an
>> extra fetch of the tuple being updated for filling in the unchanged
>> values of the update's NEW tuple, because the plan itself will only
>> produce the values of changed columns.
>>
> ...
>> It seems clear that the saving on the target list computation overhead
>> that we get from the patch is hard to ignore in this case.
>>
>> I've attached updated patches, because as Michael pointed out, the
>> previous version no longer applies.
>
> Rebased over the recent executor result relation related commits.

I also did some quick performance testing with a simple update designed
as a worst-case scenario:

create unlogged table tab (a int4, b int4);
insert into tab select g, g from generate_series(1, 10000000) g;

\timing on
vacuum tab; update tab set b = b, a = a;

Without the patch, the update takes about 7.3 s on my laptop, and about
8.3 s with the patch.

In this case, the patch fetches the old tuple, but it wouldn't really
need to, because all the columns are updated. Could we optimize that
special case?

In principle, it would sometimes also make sense to add the old columns
to the targetlist like we used to, to avoid the fetch. But estimating
when that's cheaper would be complicated.

Despite that, I like this new approach a lot. It's certainly much nicer
than inheritance_planner().

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-10-30 20:37:30 Re: Parallel copy
Previous Message Justin Pryzby 2020-10-30 20:01:00 Re: Assertion failure when ATTACH partition followed by CREATE PARTITION.