From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: Parallel Append subplan order instability on aye-aye |
Date: | 2019-08-05 05:58:55 |
Message-ID: | CA+hUKGKUBmYOrZFkvYHiyrfCW5WsVTRwCaS8mTaiY-Dyk1tCkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > In the meantime, we've had *lots* of buildfarm failures in the
> > added pg_stat_all_tables query, which indicate that indeed the
> > stats collector mechanism isn't terribly reliable. But that
> > doesn't directly prove anything about the original problem,
> > since the planner doesn't look at stats collector data.
>
> I noticed that if you look at the list of failures of this type, there
> are often pairs of animals belonging to Andres that failed at the same
> time. I wonder if he might be running a bunch of animals on one
> kernel, and need to increase net.core.rmem_max and
> net.core.rmem_default (or maybe the write side variants, or both, or
> something like that).
In further support of that theory, here are the counts of 'stats'
failures (excluding bogus reports due to crashes) for the past 90
days:
owner | animal | count
-------------------------+--------------+-------
andres-AT-anarazel.de | desmoxytes | 5
andres-AT-anarazel.de | dragonet | 9
andres-AT-anarazel.de | flaviventris | 1
andres-AT-anarazel.de | idiacanthus | 5
andres-AT-anarazel.de | komodoensis | 11
andres-AT-anarazel.de | pogona | 1
andres-AT-anarazel.de | serinus | 3
andrew-AT-dunslane.net | lorikeet | 1
buildfarm-AT-coelho.net | moonjelly | 1
buildfarm-AT-coelho.net | seawasp | 17
clarenceho-AT-gmail.com | mayfly | 2
Andres's animals report the same hostname and run at the same time, so
it'd be interesting to know what net.core.rmem_max is set to and
whether these problems go away if it's cranked up 10x higher or
something. In a quick test I can see that make installcheck is
capable of sending a *lot* of 936 byte messages in the same
millisecond.
--
Thomas Munro
https://enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Lakhin | 2019-08-05 06:30:06 | Re: Fix typos and inconsistencies for HEAD (take 9) |
Previous Message | Michael Paquier | 2019-08-05 05:46:28 | Re: Undocumented PQdisplayTuples and PQprintTuples in libpq |