Performance degradation of REFRESH MATERIALIZED VIEW

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject: Performance degradation of REFRESH MATERIALIZED VIEW
Date: 2021-03-11 08:44:37
Message-ID: CAD21AoA==f2VSw3c-Cp_y=WLKHMKc1D6s7g3YWsCOvgaYPpJcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While discussing freezing tuples during CTAS[1], we found that
heap_insert() with HEAP_INSERT_FROZEN brings performance degradation.
For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS,
it took 12 sec whereas the code without the patch took 10 sec with the
following query:

create table t1 (a, b, c, d) as select i,i,i,i from
generate_series(1,20000000) i;

I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the
following queries:

create table source as select generate_series(1, 50000000);
create materialized view mv as select * from source;
refresh materialized view mv;

The execution time of REFRESH MATERIALIZED VIEW are:

w/ HEAP_INSERT_FROZEN flag : 42 sec
w/o HEAP_INSERT_FROZEN flag : 33 sec

After investigation, I found that such performance degradation happens
on only HEAD code. It seems to me that commit 39b66a91b (and
7db0cd2145) is relevant that has heap_insert() set VM bits and
PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas
Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the
page when inserting a tuple for the first time on the page (around
L2133 in heapam.c), every subsequent heap_insert() on the page reads
and pins a VM buffer (see RelationGetBufferForTuple()). Reading and
pinning a VM buffer for every insertion is a very high cost. This
doesn't happen in heap_multi_insert() since it sets VM buffer after
filling the heap page with tuples. Therefore, there is no such
performance degradation between COPY and COPY FREEZE if they use
heap_multi_insert() (i.g., CIM_MULTI). Paul also reported it in that
thread.

As far as I read the thread and commit messages related to those
commits, they are intended to COPY FREEZE and I could not find any
discussion and mention about REFRESH MATERIALIZED VIEW. So I'm
concerned we didn't expect such performance degradation.

Setting VM bits and PD_ALL_VISIBLE at REFRESH MATERIALIZED VIEW would
be a good choice in some cases. Since materialized views are read-only
VM bits never be cleared after creation. So it might make sense for
users to pay a cost to set them at refresh (note that CREATE
MATERIALIZED VIEW doesn’t set VM bits since it’s internally treated as
CTAS). On the other hand, given this big performance degradation
(about 20%) users might want to rely on autovacuum so that VM bits are
set in the background. However, unlike COPY, there is no way to
disable freezing tuples for REFRESH MATERIALIZED VIEW. So every user
would be imposed on those costs and affected by that performance
degradation. I’m concerned that it could be a problem.

What do you think?

Regards,

[1] https://www.postgresql.org/message-id/flat/FB1F5E2D-CBD1-4645-B74C-E0A1BFAE4AC8%40vmware.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2021-03-11 09:14:48 Re: [PATCH] Provide more information to filter_prepare
Previous Message Amit Langote 2021-03-11 08:42:43 Re: Allow batched insert during cross-partition updates