Re: sqlsmith crash incremental sort

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Richard Guo <guofenglinux(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sqlsmith crash incremental sort
Date: 2020-04-16 23:13:00
Message-ID: 20200416231300.5mkhjzs6ki4yntto@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 16, 2020 at 08:44:16PM +0200, Tomas Vondra wrote:
>On Thu, Apr 16, 2020 at 12:04:03PM -0400, James Coleman wrote:
>>On Thu, Apr 16, 2020 at 8:22 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>>>
>>>
>>>On Thu, Apr 16, 2020 at 6:35 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
>>>>
>>>>
>>>>On Wed, Apr 15, 2020 at 10:47 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>>>
>>>>>
>>>>>Well, yeah. The problem is the Unique simply compares the columns in the
>>>>>order it sees them, and it does not match the column order desired by
>>>>>incremental sort. But we don't push down this information at all :-(
>>>>
>>>>
>>>>This is a nice optimization better to have. Since the 'Sort and Unique'
>>>>would unique-ify the result of a UNION by sorting on all columns, why
>>>>not we adjust the sort order trying to match parse->sortClause so that
>>>>we can avoid the final sort node?
>>>>
>>>>Doing that we can transform plan from:
>>>>
>>>># explain (costs off) select * from foo union select * from foo order by 1,3;
>>>> QUERY PLAN
>>>>-----------------------------------------------
>>>> Incremental Sort
>>>> Sort Key: foo.a, foo.c
>>>> Presorted Key: foo.a
>>>> -> Unique
>>>> -> Sort
>>>> Sort Key: foo.a, foo.b, foo.c
>>>> -> Append
>>>> -> Seq Scan on foo
>>>> -> Seq Scan on foo foo_1
>>>>(9 rows)
>>>>
>>>>To:
>>>>
>>>># explain (costs off) select * from foo union select * from foo order by 1,3;
>>>> QUERY PLAN
>>>>-----------------------------------------
>>>> Unique
>>>> -> Sort
>>>> Sort Key: foo.a, foo.c, foo.b
>>>> -> Append
>>>> -> Seq Scan on foo
>>>> -> Seq Scan on foo foo_1
>>>>(6 rows)
>>>>
>>>
>>>Attached is what I'm thinking about this optimization. Does it make any
>>>sense?
>>
>>Shouldn't this go one either a new thread or on the thread for the
>>patch Tomas was referencing (by Teodor I believe)?
>>
>
>FWIW the optimization I had in mind is this:
>
> https://commitfest.postgresql.org/21/1651/
>
>I now realize that was about GROUP BY, but it's not all that different
>and the concerns will / should be fairly similar, I think.
>
>IMO simply tweaking the sort keys to match the upper parts of the plan
>is probably way too simplistic, I'm afraid. For example, if the Unique
>significantly reduces cardinality, then the cost of the additional sort
>is much less important. It may be much better to optimize the "large"
>sort of the whole data set, either by reordering the columns as proposed
>by Teodor in his patch (by number of distinct values and/or cost of the
>comparison function function).
>
>Furthermore, this is one of the places that is not using incremental
>sort yet - I can easily imagine doing something like this:
>
>
> Sort
> -> Unique
> -> Incremenal Sort
> -> ...
>
>could be a massive win. So I think we can't just rejigger the sort keys
>abitrarily, we should / need to consider those alternatives.
>
>>Or are you saying you believe this patch guarantees we never see this
>>problem in incremental sort costing?
>>
>
>Yeah, that's not entirely close to me. But maybe it shows us where we to
>get the unprocessed target list?
>

I think at the very least this needs to apply the same change also to
generate_nonunion_paths, because otherwise this fails because of the
same issue:

set enable_hashagg = off;
explain select * from foo except select * from foo order by 1, 3;

I'm still of the opinion that this is really an optimization and
behavior change, and I feel rather uneasy about pushing it post feature
freeze without appropriate review. I also think we really ought to
consider how would this work with the other optimizations I outlined
elsewhere in this thread (comparison costs, ...).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2020-04-16 23:54:32 [PATCH] Fix possible overflow on tuplesort.c
Previous Message Alvaro Herrera 2020-04-16 23:03:04 Re: Poll: are people okay with function/operator table redesign?