Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: James Coleman <jtc331(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-07 00:47:54
Message-ID: CAAaqYe9qzKbxCvSp3dfLkuS1v8KKnB7kW3z-hZ2jnAQaveSm8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 6, 2020 at 7:31 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote:
> >On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >>
> >> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
> >> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
> >> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >> >>
> >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
> >> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
> >> >> >>On 2020-Apr-06, Tom Lane wrote:
> >> >> >>
> >> >> >>>Locally, things pass without force_parallel_mode, but turning it on
> >> >> >>>produces failures that look similar to rhinoceros's (didn't examine
> >> >> >>>other BF members).
> >> >> >>
> >> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
> >> >> >>and they were all identical. I can confirm that, in my laptop, the
> >> >> >>tests work without that GUC, and fail in exactly that way with it.
> >> >> >>
> >> >> >
> >> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
> >> >> >soon. I'll push a fix in a minute.
> >> >> >
> >> >>
> >> >> OK, I've pushed a fix - this should make the buildfarm happy again.
> >> >>
> >> >> It however seems to me a bit more needs to be done. The fix makes
> >> >> show_incremental_sort_info closer to show_sort_info, but not entirely
> >> >> because IncrementalSortState does not have sort_Done flag so it still
> >> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
> >> >> before, but not having that flag seems a bit weird to me.
> >> >>
> >> >> It also seems possibly incorrect - we may end up with
> >> >>
> >> >> fullsortGroupInfo->groupCount == 0
> >> >> prefixsortGroupInfo->groupCount > 0
> >> >>
> >> >> but we won't print anything.
> >> >
> >> >This shouldn't ever be possible, because the only way we get any
> >> >prefix groups at all is if we've already sorted a full sort group
> >> >during the mode transition.
> >> >
> >> >> James, any opinion on this? I'd say we should restore the sort_Done flag
> >> >> and make it work as in plain Sort. Or some comment explaining why
> >> >> depending on the counts is OK (assuming it is).
> >> >
> >> >There's previous email traffic on this thread about that (I can look
> >> >it up later this evening), but the short of it is that I believe that
> >> >relying on the group count is actually more correct than a sort_Done
> >> >flag in the case of incremental sort (in contrast to regular sort).
> >> >
> >>
> >> OK. Maybe we should add a comment to explain.c saying it's OK.
> >>
> >> I've pushed a fix for failures due to different planned workers (in the
> >> test I added to show changes due to add_partial_path tweaks).
> >>
> >> It seems we're not out of the woods yet, though. rhinoceros and
> >> sidewinder failed with something like this:
> >>
> >> Sort Method: quicksort Memory: NNkB
> >> + Sort Method: unknown Disk: NNkB
> >>
> >> Would you mind investigating at it?
> >
> >I assume that means those build farm members run with very low
> >work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
> >these tests? Or is that bad practice and these are to expose issues
> >with changing into disk sort mode?
> >
>
> I don't think so - I don't see any work_mem changes in the config - see
> the extra_config at the beginning of the page with details:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2020-04-06%2023%3A00%3A16
>
> Moreover, this seems to be in regular Sort, not Incremental Sort and it
> very much seems like it gets confused to print a worker info because the
> only way for Sort to print two "Sort Method" lines seems to be to enter
> either both
>
> if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
> {
> ... print leader info ...
> }
>
> and
>
> if (sortstate->shared_info != NULL)
> {
> for (n = 0; n < sortstate->shared_info->num_workers; n++)
> {
> ... print worker info ...
> }
> }
>
> or maybe there are two workers? It's strange ...
>
>
> It doesn't seem to be particularly platform-specific, but I've been
> unable to reproduce it so far. It seems on older gcc versions, though.

I haven't been able to reproduce it, but I'm 99% confident this will fix it:

- if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS)
+ if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS
+ || sinstrument->sortMethod == NULL)
continue; /* ignore any unfilled slots */

Earlier we'd had this discussion about why SORT_TYPE_STILL_IN_PROGRESS
was explicitly set to 0 in the enum declaration. Since there was no
comment, we changed that, but here I believe that show_sort_info was
relying on that as an indicator that a worker didn't actually do any
work (since the DSM for the sort node gets set to all zeros, this
would work).

I'm not sure if the SORT_TYPE_STILL_IN_PROGRESS case is actually still
needed, though.

I've attached both a fix for this issue and a comment for the
full/prefix sort group if blocks.

James

Attachment Content-Type Size
v1-0002-Comment-show_incremental_sort_info-assumtions.patch text/x-patch 1.9 KB
v1-0001-Don-t-show-worker-info-for-sort-node-if-no-work-d.patch text/x-patch 1.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-04-07 00:49:25 Re: FETCH FIRST clause WITH TIES option
Previous Message David Zhang 2020-04-07 00:44:05 Re: ERROR: invalid input syntax for type circle