Re: Parallel Append subplan order instability on aye-aye

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Parallel Append subplan order instability on aye-aye
Date: 2019-08-05 05:58:55
Message-ID: CA+hUKGKUBmYOrZFkvYHiyrfCW5WsVTRwCaS8mTaiY-Dyk1tCkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > In the meantime, we've had *lots* of buildfarm failures in the
> > added pg_stat_all_tables query, which indicate that indeed the
> > stats collector mechanism isn't terribly reliable. But that
> > doesn't directly prove anything about the original problem,
> > since the planner doesn't look at stats collector data.
>
> I noticed that if you look at the list of failures of this type, there
> are often pairs of animals belonging to Andres that failed at the same
> time. I wonder if he might be running a bunch of animals on one
> kernel, and need to increase net.core.rmem_max and
> net.core.rmem_default (or maybe the write side variants, or both, or
> something like that).

In further support of that theory, here are the counts of 'stats'
failures (excluding bogus reports due to crashes) for the past 90
days:

owner | animal | count
-------------------------+--------------+-------
andres-AT-anarazel.de | desmoxytes | 5
andres-AT-anarazel.de | dragonet | 9
andres-AT-anarazel.de | flaviventris | 1
andres-AT-anarazel.de | idiacanthus | 5
andres-AT-anarazel.de | komodoensis | 11
andres-AT-anarazel.de | pogona | 1
andres-AT-anarazel.de | serinus | 3
andrew-AT-dunslane.net | lorikeet | 1
buildfarm-AT-coelho.net | moonjelly | 1
buildfarm-AT-coelho.net | seawasp | 17
clarenceho-AT-gmail.com | mayfly | 2

Andres's animals report the same hostname and run at the same time, so
it'd be interesting to know what net.core.rmem_max is set to and
whether these problems go away if it's cranked up 10x higher or
something. In a quick test I can see that make installcheck is
capable of sending a *lot* of 936 byte messages in the same
millisecond.

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2019-08-05 06:30:06 Re: Fix typos and inconsistencies for HEAD (take 9)
Previous Message Michael Paquier 2019-08-05 05:46:28 Re: Undocumented PQdisplayTuples and PQprintTuples in libpq