From: | Alexander Lakhin <exclusion(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-reproducible AIO failure |
Date: | 2025-05-27 15:00:01 |
Message-ID: | 5637b54e-fc7e-4b58-a803-b39d56d71750@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello hackers,
27.05.2025 16:35, Andres Freund пишет:
> On 2025-05-25 20:05:49 -0400, Tom Lane wrote:
>> Thomas Munro<thomas(dot)munro(at)gmail(dot)com> writes:
>>> Could you guys please share your exact repro steps?
>> I've just been running 027_stream_regress.pl over and over.
>> It's not a recommendable answer though because the failure
>> probability is tiny, under 1%. It sounded like Alexander
>> had a better way.
> Just FYI, I've been trying to reproduce this as well, without a single failure
> so far. Despite running all tests for a few hundred times (~2 days) and
> 027_stream_regress.pl many hundreds of times (~1 day).
>
> This is on a m4 mac mini. I'm wondering if there's some hardware specific
> memory ordering issue or disk speed based timing issue that I'm just not
> hitting.
I'm sorry, but I need several days more to present a working reproducer.
I was lucky enough to catch the assert on my first attempt, without much
effort, but then something changed on that MacBook (it's not mine, I
connect to it remotely when it's available) and I can not reproduce it
anymore.
Just today, I discovered that 027_stream_regress is running very slow
there just because of shared_preload_libraries:
# after the 027_stream_regress test run
echo "shared_preload_libraries = 'pg_stat_statements'" >/tmp/extra.config
TEMP_CONFIG=/tmp/extra.config NO_TEMP_INSTALL=1 /usr/bin/time make -s check
1061,29 real 56,09 user 27,69 sys
vs
NO_TEMP_INSTALL=1 /usr/bin/time make -s check
36,42 real 27,11 user 13,98 sys
Probably it's an effect of antivirus (I see wdavdaemon_unprivileged eating
CPU time), and I uninstalled it before, but now it's installed again
(maybe by some policy). So I definitely need more time to figure out the
exact recipe for triggering the assert.
As to the configure options, when I tried to reproduce the issue on other
(non-macOS) machines, I used options from sifaka:
-DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
but then I added -DREAD_STREAM_DISABLE_FAST_PATH to stress read_stream,
and then I just copied that command and ran it on MacBook...
So I think the complete compilation command was (and I'm seeing it in
the history):
CFLAGS="-DREAD_STREAM_DISABLE_FAST_PATH -DWRITE_READ_PARSE_PLAN_TREES -DSTRESS_SORT_INT_MIN
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS" ./configure --enable-injection-points --enable-cassert --enable-debug
--enable-tap-tests --prefix=/tmp/pg -q && make -s -j8 && make -s install && make -s check
... then running 5 027_stream_regress tests in parallel ...
I had also applied a patch to repeat "test: brin" line, but I'm not sure
it does matter.
Sorry for the lack of useful information again.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
From | Date | Subject | |
---|---|---|---|
Next Message | Ken Marshall | 2025-05-27 15:10:14 | Re: Cygwin support |
Previous Message | Andrew Dunstan | 2025-05-27 14:53:55 | Re: Cygwin support |