Re: neqjoinsel versus "refresh materialized view concurrently"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: neqjoinsel versus "refresh materialized view concurrently"
Date: 2018-03-19 21:33:35
Message-ID: 334.1521495215@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> On Wed, Mar 14, 2018 at 2:56 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> Is there any good way to make the regression tests fail if the plan reverts
>> to the bad one? The only thing I can think of would be to make the table
>> bigger so the regression tests becomes "noticeably slower", but that is
>> pretty vague and not user friendly to formally pass and just hope it is slow
>> enough for someone to investigate.

> I can't think of a good way. I guess it can still pick a nested loop
> if it thinks there'll only be a couple of loops. This patch tells it
> to pay attention to the total cost, not the startup cost, so as soon
> as it thinks there is more than a hand full of rows the quadratic cost
> will exceed the sort/merge's logarithmic cost.

Right. After further thought, the key point here is that in non-error
cases the query will produce no rows, meaning that it must be executed
to completion before we can be sure of that. But applying a LIMIT
encourages the planner to pick a fast-start (slow-completion) plan,
which is not going to be what we want. If in fact the query were going
to produce a lot of rows, and the planner could accurately predict that,
then maybe a LIMIT would be useful --- but there's no hope of estimates
on wholerowvar *= wholerowvar being accurate any time soon, let alone
correctly handling the correlation with ctid <> ctid. So the LIMIT
is just an invitation to trouble and we may as well remove it.

Committed that way. I also changed EXISTS(SELECT * ...) to
EXISTS(SELECT 1 ...), in hopes of saving a few microseconds of
parsing effort.

> Since I've had hash joins on the mind recently I couldn't help
> noticing that you can't get a hash join out of this query's "record
> image" based join qual (or even a regular row-based =).

Yeah, because there's no hash support for recordimage. So there's even
less reason to be worried about how smart the planner is for this query:
basically, it might do a nestloop for a very small number of rows, but
otherwise it's gonna have to go for a merge join.

My earlier thought that we might be able to skip the ANALYZE step
seems wrong, though. It's true that it does little for this query,
but the follow-on query to build a diff table can probably make
good use of the stats.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-03-19 21:39:03 Re: Problems with Error Messages wrt Domains, Checks
Previous Message Jeremy Finzel 2018-03-19 21:19:18 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid