Re: Autoprewarm workers terminated due to a segmentation fault

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Glauber Batista <glauberrbatista(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Autoprewarm workers terminated due to a segmentation fault
Date: 2026-06-09 21:44:28
Message-ID: f4c7fe76-6b98-4d92-a090-2ba8184b8997@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 6/9/26 23:06, Matheus Alcantara wrote:
> Hi,
>
> On Tue Jun 9, 2026 at 3:37 PM -03, Glauber Batista wrote:
>> I have an issue with the autoprewarm workers segfaulting during the service
>> restart. Sometimes, it successfully restarts after a few tries, but usually
>> I need to remove the autoprewarm.blocks file. My setup consists of a
>> primary server with two replicas and all of them present the same issue. I
>> have been using this setup for several years with no issues, but since I
>> upgraded to Postgres 18 I'm having it. This is a production database.
>>
>> Details:
>>
>> [ ... ]
>>
>> All that said, it seems there's a missing guard-clause at line 649. I
>> didn't spend much time reading the code, but it's clearly accessing a
>> position in the array that is not allocated.
>>
>
> Thank you for the report!
>
> I've managed to reproduce the issue with the following:
>
> create table test_warm (id int, data text);
> -- insert enough rows to generate too many pages on an index
> insert into test_warm select g, repeat('a', 100) from generate_series(1, 5000000) g;
> create index warm_idx on test_warm(id);
>
> -- force read the index entirely into shared_buffers
> select count(*) from test_warm where id > 0;
>
> Then pg_ctl stop and pg_ctl start will start failing with the following logs:
>
> 2026-06-09 17:47:40.924 -03 [23025] LOG: shutting down
> 2026-06-09 17:47:40.925 -03 [23025] LOG: checkpoint starting: shutdown fast
> 2026-06-09 17:47:41.033 -03 [23022] LOG: database system is shut down
> 2026-06-09 17:47:49.830 -03 [23172] LOG: starting PostgreSQL 19beta1 on aarch64-darwin, compiled by clang-17.0.0, 64-bit
> 2026-06-09 17:47:49.842 -03 [23172] LOG: database system is ready to accept connections
> 2026-06-09 17:47:49.917 -03 [23172] LOG: background worker "autoprewarm worker" (PID 23182) was terminated by signal 11: Segmentation fault: 11
> 2026-06-09 17:47:49.917 -03 [23172] LOG: terminating any other active server processes
> 2026-06-09 17:47:49.918 -03 [23172] LOG: all server processes terminated; reinitializing
>
> Wondering if the following would be enough?
>
> /* Advance i past all the blocks just prewarmed. */
> i = p.pos;
> + if (i >= apw_state->prewarm_stop_idx)
> + break;
> +
> blk = block_info[i];
>
So how does it get advanced past the prewarm_stop_idx? I've been unable
to reproduce it locally, maybe it's platform-specific. The original
report was from ARM, are you on ARM too, Matheus?

But AFAIK the code may not account for read stream callback updating the
pos to prewarm_stop_idx? The callback may end with (p->pos =
apw_state->prewarm_stop_idx), and that seems to be past the end of the
array.

That'd mean the proposed check is generally the correct way to fix this.
TBH it's not clear to me why this needs to set the *next* entry at the
end of the loop. Well, it does that so that the loop condition can use
'blk', but that seems a bit fragile / confusing, and no one noticed the
issue.

Maybe this would be a better way to write the while loop?

while (i < apw_state->prewarm_stop_idx)
{
blk = block_info[i];

if (blk.tablespace != tablespace ||
blk.filenumber != filenumber)
break;

...
}

regards

--
Tomas Vondra

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Matheus Alcantara 2026-06-09 22:25:55 Re: Autoprewarm workers terminated due to a segmentation fault
Previous Message Matheus Alcantara 2026-06-09 21:06:09 Re: Autoprewarm workers terminated due to a segmentation fault