Re: [sqlsmith] Crash in mcv_get_match_bitmap

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tomas Vondra <tomas(dot)vondra(at)postgresql(dot)org>
Subject: Re: [sqlsmith] Crash in mcv_get_match_bitmap
Date: 2019-07-18 11:20:01
Message-ID: 20190718112001.gtnquw24f4qzxia5@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've pushed the fixes listed in the previous message, with the exception
of the collation part, because I had some doubts about that.

1) handling of NULL values in Cons / MCV items

The handling of NULL elements was actually a bit more broken, because it
also was not quite correct for NULL values in the MCV items. The code
treated this as a mismatch, but then skipped the rest of the evaluation
only for AND-clauses (because then 'false' is final). But for OR-clauses
it happily proceeded to call the proc, etc. It was not hard to cause a
crash with statistics on varlena columns.

I've fixed this and added a simple regression test to check this. It
however shows the stats_ext suite needs some improvements - until now it
only had AND-clauses. Now it has one simple OR-clause test, but it needs
more of that - and perhaps some combinations mixing AND/OR. I've tried
adding an copy of each existing query, with AND replaced by OR, and that
works fine (no crashes, estimates seem OK). But it's quite heavy-handed
way to create regression tests, so I'll look into this in PG13 cycle.

2) collations

Now, for the collation part - after some more thought and looking at code
I think the fix I shared before is OK. It uses per-column collations
consistently both when building the stats and estimating clauses, which
makes it consistent. I was not quite sure if using Var->varcollid is the
same thing as pg_attribute.attcollation, but it seems it is - at least for
Vars pointing to regular columns (which for extended stats should always
be the case).

And we reset stats whenever the attribute type changes (which includes
change of collation for the column), so I think it's OK. To be precise, we
only reset MCV list in that case - we keep mvdistinct and dependencies,
but that's probably OK because those don't store values and we won't
run any functions on them.

So I think the attached patch is OK, but I'd welcome some feedback.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Use-column-collation-for-extended-statistics.patch text/plain 4.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-07-18 11:28:19 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Thomas Munro 2019-07-18 11:11:08 Re: POC: Cleaning up orphaned files using undo logs