[PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired

From: Mohamed ALi <moali(dot)pg(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired
Date: 2026-04-09 06:16:48
Message-ID: CAGnOmWqi1D9ycBgUeOGf6mOCd2Dcf=6sKhbf4sHLs5xAcKVCMQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

A partitioned (parent) index in PostgreSQL can become permanently
stuck with `indisvalid = false` even after all of its child partition
indexes have been repaired and are valid. There is no built-in
mechanism to re-validate the parent index after a child is fixed via
`REINDEX`. This affects all currently supported PostgreSQL versions
(13 through 18)
The root cause is that `validatePartitionedIndex()` — the only
function that can mark a partitioned index as valid is never called
after `REINDEX` operations, and is skipped when re-running `ALTER
INDEX ATTACH PARTITION` on an already-attached index.

How the Bug Manifests

Typical Scenario :
1. A partitioned table has multiple partitions.
2. The user creates indexes on partitions concurrently. One fails (due
to deadlock, cancellation, timeout, etc.), leaving an invalid
partition index.
3. A parent index is created (or the invalid index is attached to an
existing parent). The parent is correctly marked `indisvalid = false`
because at least one child is invalid.
4. The user fixes the broken child index with `REINDEX INDEX CONCURRENTLY`.
5. The child index becomes valid (`indisvalid = true`).
6. The parent index remains `indisvalid = false` permanently. No SQL
command can fix it.

Reproduction steps:

```sql
-- ============================================================
-- SETUP: Partitioned table with two partitions and sample data
-- ============================================================
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
id serial,
order_date date NOT NULL,
amount numeric
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2023 PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
INSERT INTO orders (order_date, amount)
SELECT d, random() * 1000
FROM generate_series('2023-01-01'::date, '2023-12-31'::date, '1 day') d;
INSERT INTO orders (order_date, amount)
SELECT d, random() * 1000
FROM generate_series('2024-01-01'::date, '2024-12-31'::date, '1 day') d;
-- ============================================================
-- STEP 1: Create parent index with ONLY (starts as invalid)
-- ============================================================
CREATE INDEX orders_amount_idx ON ONLY orders (amount);
-- Verify: parent index is invalid (no children attached yet)
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%idx%'
ORDER BY c.relname;
-- Expected:
-- orders_amount_idx | f
-- ============================================================
-- STEP 2: Create valid index on first partition
-- ============================================================
CREATE INDEX CONCURRENTLY orders_2023_amount_idx ON orders_2023 (amount);
-- ============================================================
-- STEP 3: Create an INVALID index on second partition
-- ============================================================
-- In a separate session, hold a lock:
BEGIN; LOCK TABLE orders_2024 IN SHARE MODE;
-- Then in the main session:
SET statement_timeout = '1ms';
CREATE INDEX CONCURRENTLY orders_2024_amount_idx ON orders_2024 (amount);
RESET statement_timeout;
-- it will fail/timeout, leaving an invalid index.
-- Verify state:
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%idx%'
ORDER BY c.relname;
-- Expected:
-- orders_2023_amount_idx | t (valid)
-- orders_2024_amount_idx | f (invalid)
-- orders_amount_idx | f (invalid, created with ONLY)
-- ============================================================
-- STEP 4: Attach both partition indexes to the parent
-- ============================================================
-- Attach the invalid one first
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx;
-- Succeeds. Parent stays invalid (correct — child is invalid).
-- Attach the valid one
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2023_amount_idx;
-- Succeeds. Parent still invalid (correct — one child still invalid).
-- Verify attachment and validity:
SELECT c.relname, i.indisvalid,
pg_get_indexdef(i.indexrelid) AS indexdef
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- Expected:
-- orders_2023_amount_idx | t
-- orders_2024_amount_idx | f
-- orders_amount_idx | f
-- ============================================================
-- STEP 5: Fix the invalid child index via REINDEX
-- ============================================================
REINDEX INDEX CONCURRENTLY orders_2024_amount_idx;
-- Verify: child is now valid
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- ACTUAL (buggy) result:
-- orders_2023_amount_idx | t (valid)
-- orders_2024_amount_idx | t (valid — fixed by REINDEX)
-- orders_amount_idx | f (STILL INVALID — this is the bug!)
--
-- EXPECTED result (if bug were fixed):
-- orders_2023_amount_idx | t
-- orders_2024_amount_idx | t
-- orders_amount_idx | t (should be valid now)
-- ============================================================
-- STEP 6: Demonstrate that re-running ATTACH does not help
-- ============================================================
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx;
-- Returns "ALTER INDEX" (succeeds silently, does nothing)
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid = i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- Parent is STILL invalid. The "silently do nothing" path
-- skips validatePartitionedIndex() entirely.
-- ============================================================
-- CLEANUP
-- ============================================================
DROP TABLE orders;
```

Root Cause Analysis:

Where `validatePartitionedIndex()` Is Called

The function is called in exactly these code paths:
1. During `ALTER INDEX ... ATTACH PARTITION` — inside
`ATExecAttachPartitionIdx()`
2. During `ALTER TABLE ... ATTACH PARTITION` — via
`AttachPartitionEnsureIndexes()`
3. During `CREATE INDEX` on partitioned tables — via `DefineIndex()`
It is NOT called:
- After `REINDEX` of a partitioned index
- During any maintenance operation
- As any periodic validation check

Bug 1: REINDEX Does Not Validate Parent

When `reindex_index()` in `src/backend/catalog/index.c` marks a
partition index as valid (setting `indisvalid = true`), it does not
check whether the parent partitioned index should also become valid.
The function simply updates the child's `pg_index` entry and returns.

Bug 2: Re-running ATTACH Skips Validation

In `ATExecAttachPartitionIdx()` (tablecmds.c, around line 21923 in PG
16 / line ~22900 in HEAD):
https://github.com/postgres/postgres/blob/master/src/backend/commands/tablecmds.c#L21923

```c
/* Silently do nothing if already in the right state */
currParent = partIdx->rd_rel->relispartition ?
get_partition_parent(partIdxId, false) : InvalidOid;
if (currParent != RelationGetRelid(parentIdx))
{
// ... all validation checks and attachment logic ...
validatePartitionedIndex(parentIdx, parentTbl); // ONLY called here
}
// If already attached, entire block is skipped — no validation!
```

When the child is already attached (`currParent == parentIdx`), the
condition is false, the entire if-block is skipped, and
`validatePartitionedIndex()` is never called. The comment "Silently do
nothing if already in the right state" is misleading "already
attached" does not mean "parent validity is correct."

Proposed Fixes:

Fix 1 : Always Validate Parent Index in ALTER INDEX ATTACH PARTITION

Patch File : 0001-Always-validate-parent-index-in-ALTER-INDEX-ATTACH.patch

Move the validatePartitionedIndex() call outside the if-block so it runs
unconditionally — both when a new attachment is made and when the partition is
already attached. This provides a user-accessible recovery path: after fixing a
child index with REINDEX, re-running ALTER INDEX ATTACH PARTITION triggers
parent validation.

When the partition is already attached, a NOTICE is emitted:

NOTICE: partition index "child_idx" is already attached to
"parent_idx", validating parent index

This follows PostgreSQL's existing convention of using NOTICE for
informational messages about no-op or reduced-scope operations (e.g.,
DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS). It tells the user:

1- Nothing went wrong
2- The index was already attached (so they know the state)
3- Validation still happened (so they know the fix path works)

Fix 2: Validate Parent Partitioned Index After REINDEX of Child

Patch File : 0001-Validate-parent-partitioned-index-after-REINDEX.patch

Same underlying bug but this patch addresses it from the
REINDEX side. When a partition index is repaired via REINDEX or
REINDEX CONCURRENTLY, the parent partitioned index remains permanently
stuck with indisvalid = false even though all children are now valid.

This is because validatePartitionedIndex() — the only function that can
mark a partitioned index as valid is never called from any REINDEX code
path.

validatePartitionedIndex() is only called during:

1- ALTER INDEX ... ATTACH PARTITION (tablecmds.c)
2- ALTER TABLE ... ATTACH PARTITION (tablecmds.c)
3- CREATE INDEX on partitioned tables (indexcmds.c)

It is NOT called after:

1- REINDEX INDEX (regular) — handled by reindex_index() in index.c
2- REINDEX INDEX CONCURRENTLY — handled by ReindexRelationConcurrently()

in indexcmds.c, which uses index_concurrently_swap() in index.c

Three changes are made:

1. Make validatePartitionedIndex() public
The function was static in tablecmds.c. It is now exported via
tablecmds.h so it can be called from index.c and indexcmds.c.

Files changed:

src/backend/commands/tablecmds.c — remove static, update comment
src/include/commands/tablecmds.h — add extern declaration

2. Call from reindex_index() (regular REINDEX)
After reindex_index() marks a partition index as valid (indisvalid = true),
check if the index is a partition (iRel->rd_rel->relispartition) and if so,
look up the parent and call validatePartitionedIndex().

A CommandCounterIncrement() is required before the call so that the child's
updated indisvalid is visible to the syscache lookup that
validatePartitionedIndex() performs internally.

File changed: src/backend/catalog/index.c

3. Call from ReindexRelationConcurrently() (REINDEX CONCURRENTLY)
REINDEX CONCURRENTLY uses a completely different code path: it creates a new
index, builds it concurrently, then swaps it with the old one via
index_concurrently_swap(). The new index inherits the old index's partition
status during the swap.

After the swap and the existing CommandCounterIncrement() (which makes the
swap visible), check if the new index is a partition and call
validatePartitionedIndex() on the parent.

File changed: src/backend/commands/indexcmds.c

Multi-level Hierarchy Support
validatePartitionedIndex() already handles multi-level partition hierarchies.
When it marks a mid-level parent valid, it checks if that parent is itself a
partition and recursively validates the grandparent. No additional recursion
logic is needed in the REINDEX patches.

Thanks,
Mohamed Ali
Senior DBE
AWS RDS

Attachment Content-Type Size
0001-Validate-parent-partitioned-index-after-REINDEX.patch application/octet-stream 6.4 KB
0001-Always-validate-parent-index-in-ALTER-INDEX-ATTACH.patch application/octet-stream 3.0 KB

Browse pgsql-hackers by date

  From Date Subject
Previous Message Chao Li 2026-04-09 06:14:23 Re: Add errdetail() with PID and UID about source of termination signal