Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Andres Freund <andres(at)anarazel(dot)de>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2021-07-09 17:17:49
Message-ID: 20210709171749.pt7ztzlsvpbbk7yj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote:
> Currently, the TIDs of dead tuples are stored in an array that is
> collectively allocated at the start of lazy vacuum and TID lookup uses
> bsearch(). There are the following challenges and limitations:

> So I prototyped a new data structure dedicated to storing dead tuples
> during lazy vacuum while borrowing the idea from Roaring Bitmap[2].
> The authors provide an implementation of Roaring Bitmap[3] (Apache
> 2.0 license). But I've implemented this idea from scratch because we
> need to integrate it with Dynamic Shared Memory/Area to support
> parallel vacuum and need to support ItemPointerData, 6-bytes integer
> in total, whereas the implementation supports only 4-bytes integers.
> Also, when it comes to vacuum, we neither need to compute the
> intersection, the union, nor the difference between sets, but need
> only an existence check.
>
> The data structure is somewhat similar to TIDBitmap. It consists of
> the hash table and the container area; the hash table has entries per
> block and each block entry allocates its memory space, called a
> container, in the container area to store its offset numbers. The
> container area is actually an array of bytes and can be enlarged as
> needed. In the container area, the data representation of offset
> numbers varies depending on their cardinality. It has three container
> types: array, bitmap, and run.

How are you thinking of implementing iteration efficiently for rtbm? The
second heap pass needs that obviously... I think the only option would
be to qsort the whole thing?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-07-09 17:36:22 Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Previous Message Zhihong Yu 2021-07-09 16:49:53 Re: short circuit suggestion in find_hash_columns()