Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: James Coleman <jtc331(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-04-07 01:13:53
Message-ID: 20200407011353.taitie33j5vj6xnz@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 06, 2020 at 08:42:13PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> It doesn't seem to be particularly platform-specific, but I've been
>> unable to reproduce it so far. It seems on older gcc versions, though.
>
>It's looking kind of like an uninitialized-memory problem. Note
>the latest from spurfowl,
>
>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spurfowl&dt=2020-04-07%2000%3A15%3A05
>
>which got through "make check" and then failed during pg_upgrade's
>repetition of the test. Similarly on rhinoceros. So there's definitely
>instability there even on one machine.
>
>Perhaps something to do with unexpected cache flushes??
>

I don't know, I've tried running the tests on a number of machines,
similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked
everywhere while the failures seem consistent.

I've been able to reproduce these failures (same symptoms) by making
sure the worker (implied by force_parallel_mode=regress) won't start.

set max_parallel_workers = 0;
set force_parallel_mode = regress;

triggers exactly those failures for me (at least during make check, I
haven't tried pg_upgrade tests etc.).

So my theory is that we fail to start parallel workers on those
machines. It's not clear to me why would it be limited to some machines
and why would it be correlated to the incremental sort? I don't think
those machines have lower number of parallel workers, no?

But maybe incremental sort allowed using more parallel queries for more
queries, and we simply run out of parallel workers that way?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-04-07 01:29:09 Re: Don't try fetching future segment of a TLI.
Previous Message Tomas Vondra 2020-04-07 01:01:37 Re: [PATCH] Incremental sort (was: PoC: Partial sort)