Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-07 01:01:37
Message-ID: 20200407010137.2bk2ljqk7gmlcfsi@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 06, 2020 at 07:27:19PM -0400, James Coleman wrote:
>On Mon, Apr 6, 2020 at 7:09 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
>>
>> On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >
>> > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
>> > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
>> > ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> > >>
>> > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
>> > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
>> > >> >>On 2020-Apr-06, Tom Lane wrote:
>> > >> >>
>> > >> >>>Locally, things pass without force_parallel_mode, but turning it on
>> > >> >>>produces failures that look similar to rhinoceros's (didn't examine
>> > >> >>>other BF members).
>> > >> >>
>> > >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
>> > >> >>and they were all identical. I can confirm that, in my laptop, the
>> > >> >>tests work without that GUC, and fail in exactly that way with it.
>> > >> >>
>> > >> >
>> > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
>> > >> >soon. I'll push a fix in a minute.
>> > >> >
>> > >>
>> > >> OK, I've pushed a fix - this should make the buildfarm happy again.
>> > >>
>> > >> It however seems to me a bit more needs to be done. The fix makes
>> > >> show_incremental_sort_info closer to show_sort_info, but not entirely
>> > >> because IncrementalSortState does not have sort_Done flag so it still
>> > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
>> > >> before, but not having that flag seems a bit weird to me.
>> > >>
>> > >> It also seems possibly incorrect - we may end up with
>> > >>
>> > >> fullsortGroupInfo->groupCount == 0
>> > >> prefixsortGroupInfo->groupCount > 0
>> > >>
>> > >> but we won't print anything.
>> > >
>> > >This shouldn't ever be possible, because the only way we get any
>> > >prefix groups at all is if we've already sorted a full sort group
>> > >during the mode transition.
>> > >
>> > >> James, any opinion on this? I'd say we should restore the sort_Done flag
>> > >> and make it work as in plain Sort. Or some comment explaining why
>> > >> depending on the counts is OK (assuming it is).
>> > >
>> > >There's previous email traffic on this thread about that (I can look
>> > >it up later this evening), but the short of it is that I believe that
>> > >relying on the group count is actually more correct than a sort_Done
>> > >flag in the case of incremental sort (in contrast to regular sort).
>> > >
>> >
>> > OK. Maybe we should add a comment to explain.c saying it's OK.
>> >
>> > I've pushed a fix for failures due to different planned workers (in the
>> > test I added to show changes due to add_partial_path tweaks).
>> >
>> > It seems we're not out of the woods yet, though. rhinoceros and
>> > sidewinder failed with something like this:
>> >
>> > Sort Method: quicksort Memory: NNkB
>> > + Sort Method: unknown Disk: NNkB
>> >
>> > Would you mind investigating at it?
>>
>> I assume that means those build farm members run with very low
>> work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
>> these tests? Or is that bad practice and these are to expose issues
>> with changing into disk sort mode?
>
>On rhinoceros I see:
>
>================== pgsql.build/src/test/regress/regression.diffs
>===================
>diff -U3 /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
>/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
>--- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out
>2020-03-14 10:37:49.156761104 -0700
>+++ /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out
>2020-04-06 16:01:13.766798059 -0700
>@@ -1328,8 +1328,9 @@
> -> Sort (actual rows=3 loops=1)
> Sort Key: sq_limit.c1, sq_limit.pk
> Sort Method: top-N heapsort Memory: xxx
>+ Sort Method: unknown Disk: 0kB
> -> Seq Scan on sq_limit (actual rows=8 loops=1)
>-(6 rows)
>+(7 rows)
>
>Same on sidewinder.
>
>Given the 0kB I'm not sure this is *just* a work_mem thing, though
>that's still something I'm curious to know about, and it's still part
>of the "problem" here.
>
>I'm investigating further.
>

I don't see how could this be caused by low work_mem value, really. What
I think it happening is that when executed with

force_parallel_mode = regress

we run this as if in a parallel worker, but then it gets somehow
confused and either (leader + worker) or two worker stats, or something
like that. I don't know why it's happening, though, or why would it be
triggered by the incremental sort patch ...

Actually, I just managed to trigger exactly this - the trick is that we
plan for certain number of workers, but then fail to start some. So for
example like this:

create table t (a int, b int, c int);

insert into t select mod(i,10), mod(i,10), i
from generate_series(1,1000000) s(i);

set force_parallel_mode = regress;

set max_parallel_workers = 0;

explain (costs off, analyze) select * from t order by a,b;

QUERY PLAN
---------------------------------------------------------------
Sort (actual time=0.010..0.010 rows=0 loops=1)
Sort Key: a, b
Sort Method: quicksort Memory: 25kB
Sort Method: unknown Disk: 0kB
-> Seq Scan on t (actual time=0.003..0.004 rows=0 loops=1)
Planning Time: 0.040 ms
Execution Time: 0.229 ms
(7 rows)

So we actually do have to print two lines, because without any workers
the leader ends up doing the work. But we don't know this is happening
because the the number of workers started is included in Gather node,
but force_parallel_mode=regress hides that.

The question is why are these failures correlated with incremental sort.
I don't think we've tweaked Sort at all, no?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-04-07 01:13:53 Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Previous Message Alvaro Herrera 2020-04-07 00:49:25 Re: FETCH FIRST clause WITH TIES option