Re: sqlsmith crash incremental sort

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sqlsmith crash incremental sort
Date: 2020-04-13 00:09:43
Message-ID: 20200413000943.bejni6axj7v4q5rn@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 12, 2020 at 12:44:45AM +0200, Tomas Vondra wrote:
>Hi,
>
>I've looked into this a bit, and at first I thought that maybe the
>issue is in how cost_incremental_sort picks the EC members. It simply
>does this:
>
> EquivalenceMember *member = (EquivalenceMember *)
> linitial(key->pk_eclass->ec_members);
>
>so I was speculating that maybe there are multiple EC members and the
>one we need is not the first one. That would have been easy to fix.
>
>But that doesn't seem to be the case - in this example the EC ony has a
>single EC member anyway.
>
> (gdb) p key->pk_eclass->ec_members
> $14 = (List *) 0x12eb958
> (gdb) p *key->pk_eclass->ec_members
> $15 = {type = T_List, length = 1, max_length = 5, elements = 0x12eb970, initial_elements = 0x12eb970}
>
>and the member is a Var with varno=0 (with a RelabelType on top, but
>that's irrelevant).
>
> (gdb) p *(Var*)((RelabelType*)member->em_expr)->arg
> $12 = {xpr = {type = T_Var}, varno = 0, varattno = 1, vartype = 12445, vartypmod = -1, varcollid = 950, varlevelsup = 0, varnosyn = 0, varattnosyn = 1, location = -1}
>
>which then triggers the assert in find_base_rel. When looking for other
>places calling estimate_num_groups I found this in prepunion.c:
>
> * XXX you don't really want to know about this: we do the estimation
> * using the subquery's original targetlist expressions, not the
> * subroot->processed_tlist which might seem more appropriate. The
> * reason is that if the subquery is itself a setop, it may return a
> * processed_tlist containing "varno 0" Vars generated by
> * generate_append_tlist, and those would confuse estimate_num_groups
> * mightily. We ought to get rid of the "varno 0" hack, but that
> * requires a redesign of the parsetree representation of setops, so
> * that there can be an RTE corresponding to each setop's output.
>
>which seems pretty similar to the issue at hand, because the subpath is
>T_UpperUniquePath (not sure if that passes as setop, but the symptoms
>match nicely).
>
>Not sure what to do about it in cost_incremental_sort, though :-(
>

I've been messing with this the whole day, without much progress :-(

I'm 99.9999% sure it's the same issue described by the quoted comment,
because the plan looks like this:

Nested Loop Left Join
-> Sample Scan on pg_namespace
Sampling: system ('7.2'::real)
-> Incremental Sort
Sort Key: ...
Presorted Key: ...
-> Unique
-> Sort
Sort Key: ...
-> Append
-> Nested Loop
...
-> Nested Loop
...

so yeah, the plan does have set operations, and generate_append_tlist
does generate Vars with varno == 0, causing this issue.

But I'm not entirely sure what to do about it in cost_incremental_sort.
The comment (introduced by 89deca582a in 2017) suggests a proper fix
would require redesigning the parsetree representation of setops, and
it's a bit too late for that.

So I wonder what a possible solution might look like. I was hoping we
might grab the original target list and use that, similarly to
recurse_set_operations, but I'm not sure how/where to get it.

Another option is to use something as simple as checking for Vars with
varno==0 in cost_incremental_sort() and ignoring them somehow. We could
simply use some arbitrary estimate - by assuming the rows are unique or
something like that. Yes, I agree it's pretty ugly and I'd much rather
find a way to generate something sensible, but I'm not even sure we can
generate good estimate when doing UNION of data from different relations
and so on. The attached (ugly) patch does this.

Justin, can you try if this resolves the crashes or if there's something
else going on?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
incremental-sort-groups-fix.patch text/plain 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-04-13 00:27:43 Re: where should I stick that backup?
Previous Message Robert Haas 2020-04-13 00:02:50 Re: where should I stick that backup?