Re: Performance degradation of REFRESH MATERIALIZED VIEW

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject: Re: Performance degradation of REFRESH MATERIALIZED VIEW
Date: 2021-05-18 02:20:07
Message-ID: CAD21AoDKTF9BBAXpVYA=mQ=axHGS__SRAejYrX=4fxGiy6Pm-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 12, 2021 at 2:32 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
>
>
> On 5/11/21 5:56 PM, Masahiko Sawada wrote:
> > On Tue, May 11, 2021 at 11:07 PM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >> On 5/11/21 11:04 AM, Masahiko Sawada wrote:
> >>> On Tue, May 11, 2021 at 4:37 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >>>>
> >>>> On Wed, May 05, 2021 at 03:04:53PM +0200, Tomas Vondra wrote:
> >>>>> Thanks, that looks promising. I repeated the tests I did on 26/4, and the
> >>>>> results look like this:
> >>>>>
> >>>>> old (0c7d3bb99): 497ms
> >>>>> master: 621ms
> >>>>> patched: 531ms
> >>>>>
> >>>>> So yeah, that's a bit improvement - it does not remove the regression
> >>>>> entirely, but +5% is much better than +25%.
> >>>>
> >>>> Hmm. Is that really something we should do after feature freeze? A
> >>>> 25% degradation for matview refresh may be a problem for a lot of
> >>>> users and could be an upgrade stopper. Another thing we could do is
> >>>> also to revert 7db0cd2 and 39b66a9 from the v14 tree, and work on a
> >>>> proper solution for this performance problem for matviews for 15~.
> >>>
> >>> I think the approach proposed by Andres eliminates the extra vmbuffer
> >>> reads as much as possible. But even with the patch, there still is 5%
> >>> degradation (and there is no way to disable inserting frozen tuples at
> >>> matview refresh). Which could be a problem for some users. I think
> >>> it’s hard to completely eliminate the overhead so we might need to
> >>> consider another approach like having matview refresh use
> >>> heap_multi_insert() instead of heap_insert().
> >>>
> >>
> >> I think it's way too late to make such significant change (switching to
> >> heap_multi_insert) for v14 :-(
> >
> > Right.
> >
> >> Moreover, I doubt it affects just matview
> >> refresh - why wouldn't it affect other similar use cases? More likely
> >> it's just the case that was discovered.
> >
> > I've not tested yet but I guess COPY FROM … FREEZE using heap_insert
> > would similarly be affected since it also uses heap_insert() with
> > TABLE_INSERT_FROZEN.
> >
>
> I'd say that's somewhat acceptable, as it's a trade-off between paying a
> bit of time during COPY vs. paying much more later (when freezing the
> rows eventually).
>
> From my POV the problem here is we've not asked to freeze the rows
> (unless I'm missing something and REFRESH freezes them?), but it's still
> a bit slower. However, 5% might also be just noise due to changes in
> layout of the binary.
>
> >>
> >>> I think the changes for heap_multi_insert() are fine so we can revert
> >>> only heap_insert() part if we revert something from the v14 tree,
> >>> although we will end up not inserting frozen tuples into toast tables.
> >>>
> >>
> >> I'd be somewhat unhappy about reverting just this bit, because it'd mean
> >> that we freeze rows in the main table but not rows in the TOAST tables
> >> (that was kinda why we concluded we need the heap_insert part too).
> >>
> >> I'm still a bit puzzled where does the extra overhead (in cases when
> >> freeze is not requested) come from, TBH.
> >
> > Which cases do you mean? Doesn't matview refresh always request to
> > freeze tuples even after applying the patch proposed on this thread?
> >
>
> Oh, I didn't realize that! That'd make this much less of an issue, I'd
> say, because if we're intentionally freezing the rows it's reasonable to
> pay a bit of time (in exchange for not having to do it later). The
> original +25% was a bit too much, of course, but +5% seems reasonable.

Yes. It depends on how much the matview refresh gets slower but I
think the problem here is that users always are forced to pay the cost
for freezing tuple during refreshing the matview. There is no way to
disable it unlike FREEZE option of COPY command.

I’ve done benchmarks for matview refresh on my machine (FreeBSD 12.1,
AMD Ryzen 5 PRO 3400GE, 24GB RAM) with four codes: HEAD, HEAD +
Andres’s patch, one before 39b66a91b, and HEAD without
TABLE_INSERT_FROZEN.

The workload is to refresh the matview that simply selects 50M tuples
(about 1.7 GB). Here are the average execution times of three trials
for each code:

1) head: 42.263 sec
2) head w/ Andres’s patch: 40.194 sec
3) before 39b66a91b commit: 38.143 sec
4) head w/o freezing tuples: 32.413 sec

I also observed 5% degradation by comparing 1 and 2 but am not sure
where the overhead came from. I agree with Andres’s proposal. It’s a
straightforward approach. I think it’s a reasonable degradation
comparing to the cost of freezing tuples later. But I’m concerned a
bit that it’s reasonable that we force all users to pay the cost
during matview refresh without any choice. So we need to find the
remaining differences after applying a polished version of the patch.

FYI I’ve attached flame graphs for each evaluation. Looking at
1_head.svg, we can see CPU spent much time on visibilittmap_pin() and
it disappeared in 2_head_w_Andreas_patch.svg. There is no big
difference at a glance between 2_head_w_Andreas_patch.svg and
3_before_39b66a91b.svg.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
flame_graphs.zip application/zip 36.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-18 02:20:49 Re: Addition of authenticated ID to pg_stat_activity
Previous Message houzj.fnst@fujitsu.com 2021-05-18 02:11:00 RE: Skip partition tuple routing with constant partition key