Re: [sqlsmith] Crash in mcv_get_match_bitmap

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap
Date: 2019-07-13 15:39:55
Message-ID: 19208.1563032395@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> On Thu, Jul 11, 2019 at 05:08:22PM +0200, Tomas Vondra wrote:
>> On Wed, Jul 10, 2019 at 06:48:16PM -0400, Tom Lane wrote:
>>> The right way to determine operator semantics is to look to see
>>> whether they are in a btree opclass. That's what the rest of the
>>> planner does, and there is no good reason for the mcv code to
>>> do it some other way.

>> Hmmm, but that will mean we're unable to estimate operators that are not
>> part of a btree opclass. Which is a bit annoying, because that would also
>> affect equalities (and thus functional dependencies), in which case the
>> type may easily have just hash opclass or something.

After thinking about this more, I may have been analogizing to the wrong
code. It's necessary to use opclass properties when we're reasoning about
operators in a way that *must* be correct, for instance to conclude that
a partition can be pruned from a query. But this code is just doing
selectivity estimation, so the correctness standards are a lot lower.
In particular I see that the long-established range-query-detection
code in clauselist_selectivity is looking for operators that have
F_SCALARLTSEL etc. as restriction estimators (in fact I'm guessing you
lifted parts of the mcv code from that, cause it looks pretty similar).
So if we've gotten away with that so far over there, there's probably
no reason not to do likewise here.

I am a little troubled by the point I made about operators possibly
wanting to have a more-specific estimator than scalarltsel, but that
seems like an issue to solve some other time; and if we do change that
logic then clauselist_selectivity needs to change too.

> Here are WIP patches addressing two of the issues:

> 1) determining operator semantics by matching them to btree opclasses

Per above, I'm sort of inclined to drop this, unless you feel better
about doing it like this than the existing way.

> 2) deconstructing OpExpr into Var/Const nodes

deconstruct_opexpr is still failing to verify that the Var is a Var.
I'd try something like

leftop = linitial(expr->args);
while (IsA(leftop, RelabelType))
leftop = ((RelabelType *) leftop)->arg;
// and similarly for rightop
if (IsA(leftop, Var) && IsA(rightop, Const))
// return appropriate results
else if (IsA(leftop, Const) && IsA(rightop, Var))
// return appropriate results
else
// fail

Also, I think deconstruct_opexpr is far too generic a name for what
this is actually doing. It'd be okay as a static function name
perhaps, but not if you're going to expose it globally.

> a) I don't think unary-argument OpExpr are an issue, because this is
> checked when determining which clauses are compatible (and the code only
> allows the case with 2 arguments).

OK.

> b) Const with constisnull=true - I'm not yet sure what to do about this.
> The easiest fix would be to deem those clauses incompatible, but that
> seems a bit too harsh. The right thing is probably passing the NULL to
> the operator proc (but that means we can't use FunctionCall).

No, because most of the functions in question are strict and will just
crash on null inputs. Perhaps you could just deem that cases involving
a null Const don't match what you're looking for.

> Now, looking at this code, I wonder if using DEFAULT_COLLATION_OID when
> calling the operator is the right thing. We're using type->typcollation
> when building the stats, so maybe we should do the same thing here.

Yeah, I was wondering that too. But really you should be using the
column's collation not the type's default collation. See commit
5e0928005.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-07-13 16:32:47 Re: POC: converting Lists into arrays
Previous Message Fabien COELHO 2019-07-13 15:22:05 Re: refactoring - share str2*int64 functions