Re: [sqlsmith] Crash in mcv_get_match_bitmap

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap
Date: 2019-07-15 01:34:25
Message-ID: 20190715013425.v2hqfq5ubcbslw67@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

OK, attached is a sequence of WIP fixes for the issues discussed here.

1) using column-specific collations (instead of type/default ones)

The collations patch is pretty simple, but I'm not sure it actually does
the right thing, particularly during estimation where it uses collation
from the Var node (varcollid). But looking at 5e0928005, this should use
the same collation as when building the extended statistics (which we
get from the per-column stats, as stored in pg_statistic.stacoll#).

But we don't actually store collations for extended statistics, so we
can either modify pg_statistic_ext_data and store it there, or lookup
the per-column statistic info during estimation, and use that. I kinda
think the first option is the right one, but that'd mean yet another
catversion bump.

OTOH 5e0928005 actually did modify the extended statistics (mvdistinct
and dependencies) to use type->typcollation during building, so maybe we
want to use the default type collation for some reason?

2) proper extraction of Var/Const from opclauses

This is the primary issue discussed in this thread - I've renamed the
function to examine_opclause_expression() so that it kinda resembles
examine_variable() and I've moved it to the "internal" header file. We
still need it from two places so it can't be static, but hopefully this
naming is acceptable.

3) handling of NULL values (Const and MCV items)

Aside from the issue that Const may represent NULL, I've realized the
code might do the wrong thing for NULL in the MCV item itself. It did
treat it as mismatch and update the bitmap, but it might have invoke the
operator procedure anyway (depending on whether it's AND/OR clause,
what's the current value in the bitmap, etc.). This patch should fix
both issues by treating them as mismatch, and skipping the proc call.

4) refactoring of the bitmap updates

This is not a bug per se, but while working on (3) I've realized the
code updating the bitmap is quite repetitive and it does stuff like

if (is_or)
bitmap[i] = Max(bitmap[i], match)
else
bitmap[i] = Min(bitmap[i], match)

over and over on various places. This moves this into a separate static
function, which I think makes it way more readable. Also, it replaces
the Min/Max with a plain boolean operators (the patch originally used
three states, not true/false, hence the Min/Max).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Use-proper-collations-in-extended-statistics.patch text/plain 3.7 KB
0002-Fix-handling-of-opclauses-in-extended-statistics.patch text/plain 6.0 KB
0003-Fix-handling-of-NULLs-in-MCV-items-and-constants.patch text/plain 2.0 KB
0004-Refactor-simplify-evaluation-of-MCV-bitmap.patch text/plain 7.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-15 01:43:50 Re: refactoring - share str2*int64 functions
Previous Message Thomas Munro 2019-07-15 00:58:42 Using unique btree indexes for pathkeys with one extra column