RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

From: Ana Almeida <Ana(dot)Almeida(at)timestamp(dot)pt>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Cc: Nuno Azevedo <Nuno(dot)Azevedo(at)timestamp(dot)pt>
Subject: RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
Date: 2026-03-20 12:26:36
Message-ID: VI0PR07MB10718C01F5A9FEBA311BFD1ED974CA@VI0PR07MB10718.eurprd07.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello Tomas,

We used GDB and generated a full backtrace, which we have annexed for reference.

#0 BlockIdGetBlockNumber (blockId=0x7f915c757010, blockId=0x7f915c757010) at ../../../src/include/postgres.h:405

No locals.

#1 ItemPointerGetBlockNumberNoCheck (pointer=<optimized out>) at ../../../src/include/storage/itemptr.h:95

No locals.

#2 ItemPointerGetBlockNumber (pointer=<optimized out>) at ../../../src/include/storage/itemptr.h:106

No locals.

#3 itemptr_encode (itemptr=<optimized out>) at ../../../src/include/catalog/index.h:191

block = <error reading variable block (Cannot access memory at address 0x7f915c757010)>

offset = <error reading variable offset (Cannot access memory at address 0x7f915c757014)>

encoded = <optimized out>

block = <optimized out>

offset = <optimized out>

encoded = <optimized out>

#4 validate_index_callback (itemptr=0x7f915c757010, opaque=0x7fff98611030) at index.c:3425

state = 0x7fff98611030

encoded = <error reading variable encoded (Cannot access memory at address 0x7f915c757010)>

The errors occurred in a test database that was created using pg_restore from a pg_dump of another database.

We executed DELETE, VACUUM, and REINDEX commands multiple times. During one of the executions, the REINDEX operation failed with the error: "could not open file". After this, we dropped and recreated the database and repeated the tests. In one of the executions, the same REINDEX command resulted in segmentation fault, which caused the database to crash. When the REINDEX command failed, it left behind temporary index copies with the _ccnew suffix. We manually dropped these indexes and re-ran the REINDEX command, which then completed successfully.

We also attempted to reproduce the issue by increasing certain parameters, such as maintenance_work_mem and max_parallel_maintenance_workers, suspecting it might be related to server resource constraints. However, we have not yet been able to reproduce the error.

Cumprimentos,

Ana Almeida

-----Original Message-----

From: Tomas Vondra <tomas(at)vondra(dot)me<mailto:tomas(at)vondra(dot)me>>

Sent: 18 March 2026 23:42

To: Ana Almeida <Ana(dot)Almeida(at)timestamp(dot)pt<mailto:Ana(dot)Almeida(at)timestamp(dot)pt>>; Jim Jones <jim(dot)jones(at)uni-muenster(dot)de<mailto:jim(dot)jones(at)uni-muenster(dot)de>>; pgsql-bugs(at)lists(dot)postgresql(dot)org<mailto:pgsql-bugs(at)lists(dot)postgresql(dot)org>

Cc: Nuno Azevedo <Nuno(dot)Azevedo(at)timestamp(dot)pt<mailto:Nuno(dot)Azevedo(at)timestamp(dot)pt>>

Subject: Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY

[You don't often get email from tomas(at)vondra(dot)me<mailto:tomas(at)vondra(dot)me>. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.

On 3/18/26 15:54, Ana Almeida wrote:

> Hello Jim,

>

> I didn’t notice that the error showed the schema and table name. For

> confidentiality reasons, could you please not share the schema and

> table name if this is released as a bug?

>

> Here is the information:

>

>

>

> Table

> "myschema.mytable"

>

> Column | Type | Collation | Nullable

> | Default | Storage | Compression | Stats target | Description

>

> --------------------+-----------------------------+-----------

> +----------+---------+----------+-------------+--------------+--------

> +----------+---------+----------+-------------+--------------+-----

>

> id | bigint | | not null

> | | plain | | |

>

> axxxxxx | character varying(32) | | not null

> | | extended | | |

>

> bxx | text | | not null

> | | extended | | |

>

> cxxxxxxx | text | | not null

> | | extended | | |

>

> dxxxxxxxx | text | |

> | | extended | | |

>

> lag_val | text | |

> | | extended | | |

>

> exxxxxxxxxx | text | |

> | | extended | | |

>

> fxxxxxxxxxxxxx | text | |

> | | extended | | |

>

> gxxxxxxxxxxxx | text | |

> | | extended | | |

>

> hxxxxxx | numeric | | not null

> | | main | | |

>

> ixxxxxxxxxxxxxx | numeric | |

> | | main | | |

>

> jxxxxxxxxxxxxxx | numeric | |

> | | main | | |

>

> kxxxxxx | integer | |

> | | plain | | |

>

> lxxxxxxxxxxxx | integer | | not null

> | | plain | | |

>

> mxxxxxxxxxxxxxx | timestamp without time zone | |

> | | plain | | |

>

> nxxxxxxxxxxxxx | timestamp without time zone | |

> | | plain | | |

>

> oxxxxxxxxxxxx | timestamp without time zone | |

> | | plain | | |

>

> pxxxxxxxxxxx | timestamp without time zone | | not null

> | | plain | | |

>

> qr_mydb_id | bigint | |

> | | plain | | |

>

> qxxxxxx | character varying(100) | |

> | | extended | | |

>

> Indexes:

>

> "mytable_pkey" PRIMARY KEY, btree (id)

>

> "idx_lag_val" btree (lag_val)

>

> "idx_mytable_qr_mydb" btree (qr_mydb_id)

>

> Foreign-key constraints:

>

> "fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES

> myschema.qr_mydb(id)

>

> Access method: heap

>

> Options: autovacuum_enabled=true, toast.autovacuum_enabled=true

>

>

>

> Just another note, before we also had the error below in the same

> reindex command. The database didn’t crash when that error happened

> but the reindex failed. After that, we recreated the table.

>

>

>

> ERROR: could not open file "base/179146/184526.4" (target block

> 808464432): previous segment is only 99572 blocks

>

So what was the sequence of events, exactly? You got this "could not open file" error during REINDEX CONCURRENTLY, you recreated the table and then it crashed on some later REINDEX CONCURRENTLY?

How did you recreate the table? Did you reload it from a backup or something else?

>

> We haven’t been able to reproduce the errors again.

>

That suggests it might have been some sort of data corruption, but it's just a guess. Have you checked the server log if there are any messages suggesting e.g. storage / memory issues or something like that?

Per the backtrace you shared in the previous message, the segfault happened here:

#0 0x00000000005d67a8 validate_index_callback (postgres)

#1 0x00000000005738bd btvacuumpage (postgres)

#2 0x0000000000573d8a btvacuumscan (postgres)

#3 0x0000000000573f00 btbulkdelete (postgres)

...

Which is a very heavily exercised code, so I'm somewhat skeptical a bug would go unnoticed for very long. It's possible, of course. But the validate_index_callback doesn't do all that much - it just writes the TID value to a tuplesort / temporary file.

It seems you have the core saved in a file:

> Storage: /var/lib/systemd/coredump/core.postgres.26.0a32...

Can you try inspecting getting a better backtrace using gdb? It might tell us if there's a bogus pointer or something like that. Or maybe not, chances are the compiler optimized some of the variables, but it's worth a try.

regards

--

Tomas Vondra

Attachment Content-Type Size
bt_core_postgresql.txt text/plain 19.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Korotkov 2026-03-20 14:02:10 Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables
Previous Message Nathan Bossart 2026-03-19 16:44:30 Re: Revoke Connect Privilege from Database not working