Re: Deleting older versions in unique indexes to avoid page splits

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-12-01 09:50:40
Message-ID: 64f0510e-4be4-546c-1b06-f494e4e696b2@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is a wholly new concept with a lot of heuristics. It's a lot of
swallow. But here are a few quick comments after a first read-through:

On 30/11/2020 21:50, Peter Geoghegan wrote:
> +/*
> + * State used when calling table_index_delete_check() to perform "bottom up"
> + * deletion of duplicate index tuples. State is intialized by index AM
> + * caller, while state is finalized by tableam, which modifies state.
> + */
> +typedef struct TM_IndexDelete
> +{
> + ItemPointerData tid; /* table TID from index tuple */
> + int16 id; /* Offset into TM_IndexStatus array */
> +} TM_IndexDelete;
> +
> +typedef struct TM_IndexStatus
> +{
> + OffsetNumber idxoffnum; /* Index am page offset number */
> + int16 tupsize; /* Space freed in index if tuple deleted */
> + bool ispromising; /* Duplicate in index? */
> + bool deleteitup; /* Known dead-to-all? */
> +} TM_IndexStatus;
> ...
> + * The two arrays are conceptually one single array. Two arrays/structs are
> + * used for performance reasons. (We really need to keep the TM_IndexDelete
> + * struct small so that the tableam can do an initial sort by TID as quickly
> + * as possible.)

Is it really significantly faster to have two arrays? If you had just
one array, each element would be only 12 bytes long. That's not much
smaller than TM_IndexDeletes, which is 8 bytes.

> + /* First sort caller's array by TID */
> + heap_tid_shellsort(delstate->deltids, delstate->ndeltids);
> +
> + /* alltids caller visits all blocks, so make sure that happens */
> + if (delstate->alltids)
> + return delstate->ndeltids;
> +
> + /* Calculate per-heap-block count of TIDs */
> + blockcounts = palloc(sizeof(IndexDeleteCounts) * delstate->ndeltids);
> + for (int i = 0; i < delstate->ndeltids; i++)
> + {
> + ItemPointer deltid = &delstate->deltids[i].tid;
> + TM_IndexStatus *dstatus = delstate->status + delstate->deltids[i].id;
> + bool ispromising = dstatus->ispromising;
> +
> + if (curblock != ItemPointerGetBlockNumber(deltid))
> + {
> + /* New block group */
> + nblockgroups++;
> +
> + Assert(curblock < ItemPointerGetBlockNumber(deltid) ||
> + !BlockNumberIsValid(curblock));
> +
> + curblock = ItemPointerGetBlockNumber(deltid);
> + blockcounts[nblockgroups - 1].ideltids = i;
> + blockcounts[nblockgroups - 1].ntids = 1;
> + blockcounts[nblockgroups - 1].npromisingtids = 0;
> + }
> + else
> + {
> + blockcounts[nblockgroups - 1].ntids++;
> + }
> +
> + if (ispromising)
> + blockcounts[nblockgroups - 1].npromisingtids++;
> + }

Instead of sorting the array by TID, wouldn't it be enough to sort by
just the block numbers?

> * While in general the presence of promising tuples (the hint that index
> + * AMs provide) is the best information that we have to go on, it is based
> + * on simple heuristics about duplicates in indexes that are understood to
> + * have specific flaws. We should not let the most promising heap pages
> + * win or lose on the basis of _relatively_ small differences in the total
> + * number of promising tuples. Small differences between the most
> + * promising few heap pages are effectively ignored by applying this
> + * power-of-two bucketing scheme.
> + *

What are the "specific flaws"?

I understand that this is all based on rough heuristics, but is there
any advantage to rounding/bucketing the numbers like this? Even if small
differences in the total number of promising tuple is essentially noise
that can be safely ignored, is there any harm in letting those small
differences guide the choice? Wouldn't it be the essentially the same as
picking at random, or are those small differences somehow biased?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Krunal Bauskar 2020-12-01 10:02:38 Re: Improving spin-lock implementation on ARM.
Previous Message Alexander Korotkov 2020-12-01 09:48:12 Re: Improving spin-lock implementation on ARM.