From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-05-27 19:16:15 |
Message-ID: | upnyfgfvqxfwgqclyltgcrmdhu7ahjpcgftfqsslp7wtrlvl22@2k37mtsaeian |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2025-05-27 14:43:14 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I just meant that it seems that I can't reproduce it for some as of yet
> > unknown reason. I've now been through 3k+ runs of 027_stream_regress, without
> > a single failure, so there has to be *something* different about my
> > environment than yours.
>
> > Darwin m4-dev 24.1.0 Darwin Kernel Version 24.1.0: Thu Oct 10 21:06:23 PDT 2024; root:xnu-11215.41.3~3/RELEASE_ARM64_T8132 arm64
>
> > cc -v
> > Apple clang version 16.0.0 (clang-1600.0.26.4)
> > Target: arm64-apple-darwin24.1.0
> > Thread model: posix
>
> > I guess I'll try to update to a later version and see if it repros there?
>
> Maybe. All the machines I've seen it on are current-software:
>
> $ uname -a
> Darwin minim4.sss.pgh.pa.us 24.5.0 Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 arm64
> $ cc -v
> Apple clang version 17.0.0 (clang-1700.0.13.3)
> Target: arm64-apple-darwin24.5.0
> Thread model: posix
> InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
>
> If it's OS-version-specific, that raises the odds IMO that this
> is Apple's fault more than ours.
Uh, huh. After more than 3k successful runs, ~1 minute after I started to log
in graphically (to update the OS), I got my first reproduction.
2025-05-27 14:51:34.703 EDT [88755] pg_regress/sanity_check LOG: statement: VACUUM;
TRAP: failed Assert("aio_ret->result.status != PGAIO_RS_UNKNOWN"), File: "../../src/postgres/src/backend/storage/buffer/bufmgr.c", Line: 1605, PID: 88755
0 postgres 0x0000000102747514 ExceptionalCondition + 108
1 postgres 0x00000001025cd618 WaitReadBuffers + 596
2 postgres 0x00000001025c9314 read_stream_next_buffer + 428
3 postgres 0x0000000102345a24 heap_vacuum_rel + 1884
4 postgres 0x0000000102452fec vacuum_rel + 724
5 postgres 0x0000000102452b54 vacuum + 1656
6 postgres 0x0000000102452400 ExecVacuum + 1504
7 postgres 0x0000000102615990 standard_ProcessUtility + 444
8 pg_stat_statements.dylib 0x0000000102f2c39c pgss_ProcessUtility + 668
9 postgres 0x00000001026153c4 PortalRunUtility + 136
10 postgres 0x0000000102614af4 PortalRunMulti + 232
11 postgres 0x0000000102614530 PortalRun + 456
12 postgres 0x00000001026135ac exec_simple_query + 1240
13 postgres 0x000000010261084c PostgresMain + 1400
14 postgres 0x000000010260c5d4 BackendInitialize + 0
15 postgres 0x0000000102568f44 postmaster_child_launch + 372
16 postgres 0x000000010256d218 ServerLoop + 4960
17 postgres 0x000000010256b55c InitProcessGlobals + 0
18 postgres 0x00000001024beabc help + 0
19 dyld 0x0000000192b80274 start + 2840
I'll see if being graphically logged in somehow indeed increased the repro
rate, and if so I'll expand the debugging somewhat, or if this was just an
absurd coincidence.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-05-27 19:18:50 | Re: Non-reproducible AIO failure |
Previous Message | Masahiko Sawada | 2025-05-27 18:52:20 | Re: Assert("vacrel->eager_scan_remaining_successes > 0") |