Re: gs_group_1 crashing on 13beta2/s390x

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Christoph Berg <myon(at)debian(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gs_group_1 crashing on 13beta2/s390x
Date: 2020-07-16 15:08:32
Message-ID: 3358505.1594912112@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> It's hardly surprising that datumCopy would segfault when given a
> Tom> null "value" and told it is pass-by-reference. However, to get to
> Tom> the datumCopy call, we must have passed the MemoryContextContains
> Tom> check on that very same pointer value, and that would surely have
> Tom> segfaulted as well, one would think.

> Nope, because MemoryContextContains just returns "false" if passed a
> NULL pointer.

Ah, right. So you could imagine getting here if the finalfn had returned
PointerGetDatum(NULL) with isnull = false. We have some aggregate
transfns that are capable of doing that for internal-type transvalues,
I think, but the finalfn never should do it.

In any case we still have the fact that this isn't being seen in our
buildfarm; and that's not for lack of s390 machines. So I still think
the most likely explanation is a compiler bug in bleeding-edge gcc.

Probably what Christoph should be trying to figure out is why he can't
reproduce it manually. There must be some discrepancy between his
environment and the build machines; but what?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-07-16 15:14:28 Re: Have SIGHUP instead of SIGTERM for config reload in logical replication launcher
Previous Message David G. Johnston 2020-07-16 14:45:11 Re: How to identify trigger is called from the node where row is created