Re: BUG #17245: Index corruption involving deduplicated entries

From: Andres Freund <andres(at)anarazel(dot)de>
To: Kamigishi Rei <iijima(dot)yun(at)koumakan(dot)jp>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BUG #17245: Index corruption involving deduplicated entries
Date: 2021-10-30 02:37:40
Message-ID: 20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

Amit, this looks to be a parallel vacuum related bug, see below.

Whew, we found the bug, I think. Or at least one that can create exactly this
situation.

The problem is that parallel vacuum doesn't have correct handling for small
indexes, when there are other indexes that are big enough to use parallelism.

CREATE TABLE multiple_indexes(id serial primary key, data text, other serial unique);
CREATE INDEX ON multiple_indexes(data );
INSERT INTO multiple_indexes(data) SELECT 1 FROM generate_series(1, 50000);

\di+ multiple_indexes_*
List of relations
┌────────┬────────────────────────────┬───────┬────────┬──────────────────┬─────────────┬───────────────┬─────────┬─────────────┐
│ Schema │ Name │ Type │ Owner │ Table │ Persistence │ Access method │ Size │ Description │
├────────┼────────────────────────────┼───────┼────────┼──────────────────┼─────────────┼───────────────┼─────────┼─────────────┤
│ public │ multiple_indexes_data_idx │ index │ andres │ multiple_indexes │ permanent │ btree │ 336 kB │ │
│ public │ multiple_indexes_other_key │ index │ andres │ multiple_indexes │ permanent │ btree │ 1112 kB │ │
│ public │ multiple_indexes_pkey │ index │ andres │ multiple_indexes │ permanent │ btree │ 1112 kB │ │
└────────┴────────────────────────────┴───────┴────────┴──────────────────┴─────────────┴───────────────┴─────────┴─────────────┘
(3 rows)

DELETE FROM multiple_indexes;

postgres[1253065][1]=# SHOW min_parallel_index_scan_size;
┌──────────────────────────────┐
│ min_parallel_index_scan_size │
├──────────────────────────────┤
│ 512kB │
└──────────────────────────────┘
(1 row)

I.e. we have two indexes that are big enough to use parallelism, and one that
isn't.

first heap scan:

VACUUM (VERBOSE,PARALLEL 4) multiple_indexes;
INFO: 00000: vacuuming "public.multiple_indexes"
LOCATION: lazy_scan_heap, vacuumlazy.c:937

index vacuum:
INFO: 00000: launched 1 parallel vacuum worker for index vacuuming (planned: 1)
LOCATION: do_parallel_vacuum_or_cleanup, vacuumlazy.c:2768
INFO: 00000: scanned index "multiple_indexes_pkey" to remove 50000 row versions
DETAIL: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
LOCATION: lazy_vacuum_one_index, vacuumlazy.c:3057
INFO: 00000: scanned index "multiple_indexes_other_key" to remove 50000 row versions
DETAIL: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
LOCATION: lazy_vacuum_one_index, vacuumlazy.c:3057

second heap scan:
INFO: 00000: table "multiple_indexes": removed 50000 dead item identifiers in 271 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
LOCATION: lazy_vacuum_heap_rel, vacuumlazy.c:2384
INFO: 00000: table "multiple_indexes": found 50000 removable, 0 nonremovable row versions in 271 out of 271 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 748
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.02 s.
LOCATION: lazy_scan_heap, vacuumlazy.c:1668
...

So the two bigger indexes were scanned, but the small one was not.

The reason for this is that

begin_parallel_vacuum() does:
/*
* Initialize variables for shared index statistics, set NULL bitmap and
* the size of stats for each index.
*/
memset(shared->bitmap, 0x00, BITMAPLEN(nindexes));
for (int idx = 0; idx < nindexes; idx++)
{
if (!can_parallel_vacuum[idx])
continue;

/* Set NOT NULL as this index does support parallelism */
shared->bitmap[idx >> 3] |= 1 << (idx & 0x07);
}

which causes do_parallel_processing() to skip over the index, because
parallel_stats_for() returns NULL due to checking that bitmap.

And do_serial_processing_for_unsafe_indexes() skips over the index because:
/*
* We're only here for the unsafe indexes
*/
if (parallel_processing_is_safe(indrel, lvshared))
continue;

but the index is actually not unsafe.

Which leaves us with a severely corrupted index. Unfortunately one that
amcheck at this stage will not recognize as corrupted :(. It'll only get there
later, if new rows for the same tids will be inserted.

I haven't yet checked whether this is a bug introduced in 14, or whether it
was possible to hit before as well.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2021-10-30 03:57:57 Re: conchuela timeouts since 2021-10-09 system upgrade
Previous Message David M. Calascibetta 2021-10-29 23:10:00 RE: BUG #17258: Unexpected results in CHAR(1) data type