Re: Parallel Append subplan order instability on aye-aye

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Parallel Append subplan order instability on aye-aye
Date: 2019-08-11 22:41:36
Message-ID: 11424.1565563296@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Wed, Jul 24, 2019 at 11:59 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> On Tue, Jul 16, 2019 at 12:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> In the meantime, we've had *lots* of buildfarm failures in the
>>> added pg_stat_all_tables query, which indicate that indeed the
>>> stats collector mechanism isn't terribly reliable. But that
>>> doesn't directly prove anything about the original problem,
>>> since the planner doesn't look at stats collector data.

>> I noticed that if you look at the list of failures of this type, there
>> are often pairs of animals belonging to Andres that failed at the same
>> time. I wonder if he might be running a bunch of animals on one
>> kernel, and need to increase net.core.rmem_max and
>> net.core.rmem_default (or maybe the write side variants, or both, or
>> something like that).

> Andres's animals report the same hostname and run at the same time, so
> it'd be interesting to know what net.core.rmem_max is set to and
> whether these problems go away if it's cranked up 10x higher or
> something. In a quick test I can see that make installcheck is
> capable of sending a *lot* of 936 byte messages in the same
> millisecond.

Yeah. I think we've had quite enough of the stats-transmission-related
failures, and they're no longer proving anything about the original
problem. So I will go do what I proposed in mid-July and revert the
stats queries, while keeping the reltuples/relpages check. (I'd kind
of like to get more confirmation that the plan shape change is associated
with those fields reading as zeroes, before we decide what to do about the
underlying instability.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-08-12 01:04:38 Re: SegFault on 9.6.14
Previous Message pantilimonov michael 2019-08-11 21:51:42 Re: [GSoC] artbufmgr