Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW
Date: 2021-03-17 02:08:01
Message-ID: CA+hUKG+A_A6=yFv-QJw0w-7jk6iNT=Vks5+_a-xUeE9ud8Wr8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 16, 2021 at 2:41 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Mon, Mar 15, 2021 at 8:25 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > > > The problem with a case like REFRESH MATERIALIZED VIEW is that there's
> > > > nothing to prevent something that gets run in the course of the query
> > > > from trying to access the view (and the heavyweight lock won't prevent
> > > > that, due to group locking). That's probably a stupid thing to do,
> > > > but it can't be allowed to break the world. The other cases are safe
> > > > from that particular problem because the table doesn't exist yet.
> >
> > Please correct me if my understanding of the above comment (from the
> > commit e9baa5e9) is wrong - even if the leader opens the matview
> > relation in exclusive mode, because of group locking(in case we allow
> > parallel workers to feed in the data to the new heap that gets created
> > for RMV, see ExecRefreshMatView->make_new_heap), can other sessions
> > still access the matview relation with older data?
> >
> > I performed below testing to prove myself wrong for the above understanding:
> > session 1:
> > 1) added few rows to the table t1 on which the mv1 is defined;
> > 2) refresh materialized view mv1;
> >
> > session 2:
> > 1) select count(*) from mv1; ---> this query is blocked until
> > session 1's step (2) is completed and gives the latest result even if
> > the underlying data-generating query runs select part in parallel.
> >
> > Is there anything I'm missing here?
>
> I think he was talking about things like functions that try to access
> the mv from inside the same query, in a worker. I haven't figured out
> exactly which hazards he meant. I thought about wrong-relfilenode
> hazards and combocid hazards, but considering the way this thing
> always inserts into a fresh table before performing merge or swap
> steps later, I don't yet see why this is different from any other
> insert-from-select-with-gather.

I asked Robert if he had some hazard in mind that we haven't already
discussed here when he wrote that, and didn't recall any. I think
we're OK here.

I added the "concurrently" variant to the regression test, just to get
it exercised too.

The documentation needed a small tweak where we have a list of
data-writing commands that are allowed to use parallelism. That run
of sentences was getting a bit tortured so I converted it into a
bullet list; I hope I didn't upset the documentation style police.

Pushed. Thanks for working on this! This is really going to fly with
INSERT pushdown. The 3 merge queries used by CONCURRENTLY will take
some more work.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2021-03-17 02:09:32 RE: libpq debug log
Previous Message Amit Langote 2021-03-17 02:01:58 Re: crash during cascaded foreign key update