Re: Assertion failure with LEFT JOINs among >500 relations

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Onder Kalaci <onderk(at)microsoft(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assertion failure with LEFT JOINs among >500 relations
Date: 2020-10-14 02:53:55
Message-ID: CAApHDvrPc-qZP6fOF_Rtd2hZ+SNHq-x3Zfx4vbGN3Jr0sZ4g_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for having a look at this.

On Wed, 14 Oct 2020 at 04:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm on board with trying to get rid of NaN rowcount estimates more
> centrally. I do not think it is a good idea to try to wire in a
> prohibition against zero rowcounts. That is actually the correct
> thing in assorted scenarios --- one example recently under discussion
> was ModifyTable without RETURNING, and another is where we can prove
> that a restriction clause is constant-false. At some point I think
> we are going to want to deal honestly with those cases instead of
> sweeping them under the rug. So I'm disinclined to remove zero
> defenses that we'll just have to put back someday.

OK, that certainly limits the scope here. It just means we can't get
rid of the <= 0 checks in join costing functions. The problem case
that this was added for was a dummy Append. We still have valid cases
that won't convert the join rel to a dummy rel with a dummy Append on
one side.

> I think converting Inf to DBL_MAX, in hopes of avoiding creation of
> NaNs later, is fine. (Note that applying rint() to that is quite
> useless --- in every floating-point system, values bigger than
> 2^number-of-mantissa-bits are certainly integral.)

Good point.

> I'm not sure why you propose to map NaN to one. Wouldn't mapping it
> to Inf (and thence to DBL_MAX) make at least as much sense? Probably
> more in fact. We know that unwarranted one-row estimates are absolute
> death to our chances of picking a well-chosen plan.

That came around due to what the join costing functions were doing. i.e:

/* Protect some assumptions below that rowcounts aren't zero or NaN */
if (inner_path_rows <= 0 || isnan(inner_path_rows))
inner_path_rows = 1;

[1] didn't have an example case of how the NaNs were introduced, so I
was mostly just copying the logic that was added to fix that back in
72826fb3.

> > I toyed around with the attached patch, but I'm still not that excited
> > about the clamping of infinite values to DBL_MAX. The test case I
> > showed above with generate_Series(1,379) still ends up with NaN cost
> > estimates due to costing a sort with DBL_MAX rows. When I was writing
> > the patch, I had it in my head that the costs per row will always be
> > lower than 1.
>
> Yeah, that is a good point. Maybe instead of clamping to DBL_MAX,
> we should clamp rowcounts to something that provides some headroom
> for multiplication by per-row costs. A max rowcount of say 1e100
> should serve fine, while still being comfortably more than any
> non-insane estimate.
>
> So now I'm imagining something like
>
> #define MAXIMUM_ROWCOUNT 1e100

That seems more reasonable. We likely could push it a bit higher, but
I'm not all that motivated to since if that was true, then you could
expect the heat death of the universe to arrive before your query
results. In which case the user would likely struggle to find
electrons to power their computer.

> clamp_row_est(double nrows)
> {
> /* Get rid of NaN, Inf, and impossibly large row counts */
> if (isnan(nrows) || nrows >= MAXIMUM_ROWCOUNT)
> nrows = MAXIMUM_ROWCOUNT;
> else
> ... existing logic ...

I've got something along those lines in the attached.

> Perhaps we should also have some sort of clamp for path cost
> estimates, at least to prevent them from being NaNs which
> is going to confuse add_path terribly.

hmm. I'm not quite sure where to start with that one. Many of the
path estimates will already go through clamp_row_est(). There are
various special requirements, e.g Appends with no subpaths. So when to
apply it would depend on what path type it is. I'd say it would need
lots of careful analysis and a scattering of new calls in pathnode.c

I've ended up leaving the NaN checks in the join costing functions.
There was no case mentioned in [1] that showed how we hit that
reported test case, so I'm not really confident enough to know I'm not
just reintroducing the same problem again by removing that. The path
row estimate that had the NaN might not have been through
clamp_row_est(). Many don't.

David

[1] https://www.postgresql.org/message-id/7270.1302902842%40sss.pgh.pa.us

Attachment Content-Type Size
fix_infinite_row_estimates_v2.patch text/plain 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinoda, Noriyoshi (PN Japan A&PS Delivery) 2020-10-14 03:03:49 RE: Resetting spilled txn statistics in pg_stat_replication
Previous Message Noah Misch 2020-10-14 02:26:36 Re: pgstat_report_activity() and parallel CREATE INDEX (was: Parallel index creation & pg_stat_activity)