Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-06 23:31:43
Message-ID: 20200406233143.kmvjdrzne3oqigdu@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote:
>On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote:
>> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra
>> ><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >>
>> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote:
>> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote:
>> >> >>On 2020-Apr-06, Tom Lane wrote:
>> >> >>
>> >> >>>Locally, things pass without force_parallel_mode, but turning it on
>> >> >>>produces failures that look similar to rhinoceros's (didn't examine
>> >> >>>other BF members).
>> >> >>
>> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago
>> >> >>and they were all identical. I can confirm that, in my laptop, the
>> >> >>tests work without that GUC, and fail in exactly that way with it.
>> >> >>
>> >> >
>> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too
>> >> >soon. I'll push a fix in a minute.
>> >> >
>> >>
>> >> OK, I've pushed a fix - this should make the buildfarm happy again.
>> >>
>> >> It however seems to me a bit more needs to be done. The fix makes
>> >> show_incremental_sort_info closer to show_sort_info, but not entirely
>> >> because IncrementalSortState does not have sort_Done flag so it still
>> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that
>> >> before, but not having that flag seems a bit weird to me.
>> >>
>> >> It also seems possibly incorrect - we may end up with
>> >>
>> >> fullsortGroupInfo->groupCount == 0
>> >> prefixsortGroupInfo->groupCount > 0
>> >>
>> >> but we won't print anything.
>> >
>> >This shouldn't ever be possible, because the only way we get any
>> >prefix groups at all is if we've already sorted a full sort group
>> >during the mode transition.
>> >
>> >> James, any opinion on this? I'd say we should restore the sort_Done flag
>> >> and make it work as in plain Sort. Or some comment explaining why
>> >> depending on the counts is OK (assuming it is).
>> >
>> >There's previous email traffic on this thread about that (I can look
>> >it up later this evening), but the short of it is that I believe that
>> >relying on the group count is actually more correct than a sort_Done
>> >flag in the case of incremental sort (in contrast to regular sort).
>> >
>>
>> OK. Maybe we should add a comment to explain.c saying it's OK.
>>
>> I've pushed a fix for failures due to different planned workers (in the
>> test I added to show changes due to add_partial_path tweaks).
>>
>> It seems we're not out of the woods yet, though. rhinoceros and
>> sidewinder failed with something like this:
>>
>> Sort Method: quicksort Memory: NNkB
>> + Sort Method: unknown Disk: NNkB
>>
>> Would you mind investigating at it?
>
>I assume that means those build farm members run with very low
>work_mem? Is it an acceptable fix to adjust work_mem up a bit just for
>these tests? Or is that bad practice and these are to expose issues
>with changing into disk sort mode?
>

I don't think so - I don't see any work_mem changes in the config - see
the extra_config at the beginning of the page with details:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2020-04-06%2023%3A00%3A16

Moreover, this seems to be in regular Sort, not Incremental Sort and it
very much seems like it gets confused to print a worker info because the
only way for Sort to print two "Sort Method" lines seems to be to enter
either both

if (sortstate->sort_Done && sortstate->tuplesortstate != NULL)
{
... print leader info ...
}

and

if (sortstate->shared_info != NULL)
{
for (n = 0; n < sortstate->shared_info->num_workers; n++)
{
... print worker info ...
}
}

or maybe there are two workers? It's strange ...

It doesn't seem to be particularly platform-specific, but I've been
unable to reproduce it so far. It seems on older gcc versions, though.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-04-06 23:51:48 Re: [HACKERS] Restricting maximum keep segments by repslots
Previous Message James Coleman 2020-04-06 23:27:19 Re: [PATCH] Incremental sort (was: PoC: Partial sort)