Re: Why is parula failing?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Robins Tharakan <tharakan(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Tharakan, Robins" <tharar(at)amazon(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Why is parula failing?
Date: 2024-04-15 06:31:59
Message-ID: 645480.1713162719@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> #4 0x000000000090b7b4 in pg_sleep (fcinfo=<optimized out>) at misc.c:406
> delay = <optimized out>
> delay_ms = <optimized out>
> endtime = 0

> This endtime looks like a problem. It seems unlikely to be caused by
> gettimeofday's timeval fields being zeroed given that the number of
> seconds should have been added to that.

Yes, that is very odd.

> I can't quite make sense of how we end up sleeping at all with a zero
> endtime. Assuming the subsequent GetNowFloats() worked, "delay =
> endtime - GetNowFloat();" would result in a negative sleep duration
> and we'd break out of the sleep loop.

If GetCurrentTimestamp() were returning assorted random values, it
wouldn't be hard to imagine this loop sleeping for a long time.
But it's very hard to see how that theory leads to an "endtime"
of exactly zero rather than some other number, and even harder
to credit two different runs getting "endtime" of exactly zero.

> If GetNowFloat() somehow was returning a negative number then we could
> end up with a large delay. But if gettimeofday() was so badly broken
> then wouldn't there be some evidence of this in the log timestamps on
> failing runs?

And indeed that too. I'm finding the "compiler bug" theory
palatable. Robins mentioned having built the compiler from
source, which theoretically should work, but maybe something
went wrong? Or it's missing some important bug fix?

It might be interesting to back the animal's CFLAGS down
to -O0 and see if things get more stable.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-04-15 06:59:47 Re: apply_scanjoin_target_to_paths and partitionwise join
Previous Message Amit Kapila 2024-04-15 06:30:48 Re: Fix possible dereference null pointer (src/backend/replication/logical/reorderbuffer.c)