Re: Instability of phycodorus in pg_upgrade tests with JIT

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(at)paquier(dot)xyz>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Instability of phycodorus in pg_upgrade tests with JIT
Date: 2025-10-22 21:00:01
Message-ID: 563ee5af-8ee2-484f-b50a-1c8fbdd16171@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Andres,

17.10.2025 08:21, Fujii Masao wrote:
> On Fri, Oct 17, 2025 at 8:32 AM Michael Paquier<michael(at)paquier(dot)xyz> wrote:
>> On Thu, Oct 16, 2025 at 10:00:00PM +0300, Alexander Lakhin wrote:
>>> I collected all of such failures here:
>>> https://wiki.postgresql.org/wiki/Known_Buildfarm_Test_Failures#check-pg_upgrade_fails_on_LLVM-enabled_animals_due_to_double_free_or_corruption
>>>
>>> Masao-san was going to dig into that:
>>> https://www.postgresql.org/message-id/CAHGQGwFcjccSYX+Ap8meEbCccUei-B4tmYsBFu4wMEixKi90fQ@mail.gmail.com
> I tried that briefly, but unfortunately I still have no idea what caused
> this failure or what triggered the double-free issue shown below…

I've been trying to reproduce the issue locally for several days, with
clang 3.9.0 and 4.0.1 compiled from sources with -DCMAKE_BUILD_TYPE=Debug
-DLLVM_ENABLE_ASSERTIONS=ON, running buildfarm client (TestUpgrade) on
four different x86_64 systems (Debian, Ubuntu, but not the latest versions), with
no single failure so far.

(I've re-created config from petalura/phycodurus:  'jit=1',
'jit_above_cost=0', 'jit_optimize_above_cost=1000'... also tried
jit_optimize_above_cost=0...)

I tried to invoke double free with a simple program and confirmed that the
double free is detected and the program aborted.

So if I re-created all the conditions (based on buildfarm logs) correctly,
then several hundred runs, which I performed, should be enough to
reproduce the issue, but probably there is something specific with those
animals (petalura, phycodurus, desmoxytes, dragonet)... Maybe a buggy libc
update was installed there in September?

Meanwhile we've got a failure at stage Check (not pg_upgradeCheck), with a
release LLVM build [1]:
2025-10-21 17:15:16.261 CEST [1489783][client backend][:0] LOG: disconnection: session time: 0:00:03.177 user=bf
database=regression host=[local]
corrupted size vs. prev_size while consolidating

Thus, the initial suspicion that the issue is caused by dff7591a7 (because
the first failure [2] happened right after it) seems wrong now.

Maybe you have an insight on the possible cause of these memory errors?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2025-10-21%2015%3A14%3A12
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2025-09-16%2011%3A09%3A07

Best regards,
Alexander

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-10-22 21:16:54 Re: fix type of infomask parameter in static inline functions
Previous Message Sami Imseih 2025-10-22 20:57:15 Re: Skip unregistered custom kinds on stats load