Re: enable_incremental_sort changes query behavior

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: enable_incremental_sort changes query behavior
Date: 2020-10-02 13:19:44
Message-ID: CAAaqYe83Ec57fqJiA0fkEGQ-PUk_VwiR2nHbg=LHPwiG5jtXJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 1, 2020 at 9:10 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>
> On Thu, Oct 1, 2020 at 6:08 PM Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> > On Thu, Oct 01, 2020 at 09:02:57AM -0400, James Coleman wrote:
> > >On Thu, Oct 1, 2020 at 3:09 AM Jaime Casanova
> > ><jaime(dot)casanova(at)2ndquadrant(dot)com> wrote:
> > >>
> > >> On Wed, 30 Sep 2020 at 21:21, James Coleman <jtc331(at)gmail(dot)com> wrote:
> > >> >
> > >> > On Sat, Sep 26, 2020 at 2:49 PM Jaime Casanova
> > >> > <jaime(dot)casanova(at)2ndquadrant(dot)com> wrote:
> > >> > >
> > >> > > Hi,
> > >> > >
> > >> > > With sqlsmith I found a query that gives this error:
> > >> > > ERROR: ORDER/GROUP BY expression not found in targetlist
> > >> > >
> > >> [...]
> > >> > >
> > >> > > But if I set enable_incremental_sort to off the query gets executed
> > >> > > without problems (attached the explain produced for that case)
> > >> >
> > >> > Thanks for the report.
> > >> >
> > >>
> > >> Hi,
> > >>
> > >> by experiment I reduced the query to this
> > >>
> > >> --- 0 ---
> > >> select distinct
> > >> subq_0.c1 as c0,
> > >> case when (true = pg_catalog.pg_rotate_logfile_old()) then
> > >> ref_0.t else ref_0.t
> > >> end
> > >> as c4
> > >> from
> > >> public.ref_0,
> > >> lateral (select
> > >>
> > >> ref_0.i as c1
> > >> from
> > >> generate_series(1, 100) as ref_1) as subq_0
> > >> --- 0 ---
> > >>
> > >> the only custom table already needed can be created with this commands:
> > >>
> > >> --- 0 ---
> > >> create table ref_0 as select repeat('abcde', (random() * 10)::integer)
> > >> t, random() * 1000 i from generate_series(1, 500000);
> > >> create index on ref_0 (i);
> > >> analyze ref_0 ;
> > >> --- 0 ---
> > >>
> > >>
> > >> > Is there by an chance an index on ref_0.radi_text_temp?
> > >> >
> > >>
> > >> there is an index involved but not on that field, commands above
> > >> create the index in the right column... after that, ANALYZE the table
> > >>
> > >> > And if you set enable_hashagg = off what plan do you get (or error)?
> > >> >
> > >>
> > >> same error
> > >
> > >I was able to reproduce the error without incremental sort enabled
> > >(i.e., it happens with a full sort also). The function call in the
> > >SELECT doesn't have to be in a case expression; for example I was able
> > >to reproduce changing that to `random()::text || ref_0.t`.
> > >
> > >It looks like the issue happens when:
> > >1. The sort happens within a parallel node.
> > >2. One of the sort keys is an expression containing a volatile
> > >function call and a column from the lateral join.
> > >
> > >Here are the settings I used with your above repro case to show it
> > >with regular sort:
> > >
> > > enable_hashagg=off
> > > enable_incremental_sort=off
> > > enable_seqscan=off
> > > parallel_setup_cost=10
> > > parallel_tuple_cost=0
> > >
> > >The plan (obtained by replacing the volatile function with a stable one):
> > >
> > > Unique
> > > -> Nested Loop
> > > -> Gather Merge
> > > Workers Planned: 2
> > > -> Sort
> > > Sort Key: ref_0.i, (md5(ref_0.t))
> > > -> Parallel Index Scan using ref_0_i_idx on ref_0
> > > -> Function Scan on generate_series ref_1
> > >
> > >Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error.
> > >
> > >I haven't been able to dig further than that yet, but my intuition is
> > >to poke around in the parallel query machinery?
> > >
> >
> > Nope. Bisect says this was introduced by this commit:
> >
> > ba3e76cc57 Consider Incremental Sort paths at additional places
> >
> > Looking at the diff, I suspect generate_useful_gather_paths (or maybe
> > get_useful_pathkeys_for_relation) fails to do something with the
> > pathkeys.
> >
> > Of course, that'd explain why it only happens for parallel plans, as
> > this builds gather nodes.
>
> I should have known I'd regret making a wild guess. I was in the
> middle of bisecting too, but had to set it aside for a bit and hadn't
> finished.
>
> I'll try to look into that commit (and I agree those functions are the
> likely places to look) as I get a chance here.

I've been able to confirm that the problem goes away if we stop adding
the gather merge paths in generate_useful_gather_paths().

I'm not sure yet what conclusion that leads us to. It seems to be that
the biggest clue remains that all of this works correctly unless one
of the selected columns (which happens to be a pathkey at this point
because it's a DISTINCT query) contains a volatile expression.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-10-02 14:32:07 Re: enable_incremental_sort changes query behavior
Previous Message Konstantin Knizhnik 2020-10-02 12:42:46 Re: Should walsernder check correctness of WAL records?