Re: Compress prune/freeze records with Delta Frame of Reference algorithm

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Evgeny Voropaev <evgeny(dot)voropaev(at)tantorlabs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Date: 2026-04-21 07:20:46
Message-ID: b301de3c-0945-4401-995a-18d5939d0290@iki.fi
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21/04/2026 08:41, Evgeny Voropaev wrote:
> Hello hackers,
>
>> Can this DFoR code replace integerset.c easily? Can we use it for
>> the vacuum dead TID list? For GIN posting lists? Where else?
>
> Heikki, thank you for your attention and proposals. I'm learning areas
> you proposed to be developed. This took time, since I am not adept at
> them. Last week I also have been developing the DFoR patch to support
> unsorted sequences. That's why there was the delay in answering.
>
> About GIN.
> Since GIN exploits TIDs sequences and saves it on the disk, it can be
> the most appropriate candidate to be developed with DFoR.

+1. And maybe the tid lists in to B-tree tuples too while we're at it.

For GIN posting lists, one important property of the current compression
scheme is that removing TIDs never makes the list larger than the
original. That's important for VACUUM, see
https://github.com/postgres/postgres/blob/d3bba041543593eb5341683107d899734dc8e73e/src/backend/access/gin/ginpostinglist.c#L55

> About the dead TID list.
> If I'm not mistaken, the dead TID list exists only in RAM and never on
> the disk or in the network. So, what is the advantage supposed to be
> achieved due to using compression in the dead TID list?

Reduces memory usage. And if it's faster to lookup than the current data
structure, that too. I don't know if that works out.

> About the GiST vacuuming and the use of integerset in it.
> The integerset implements a tree in addition to compression.
> DFoR now performs only compression. Moreover the size of a pack is
> flexible (varying), which must become an issue for its usage in the
> tree. It needs more thorough further elaboration to be developed.

Hmm. The integerset is a sparse list of integers, just like Frame of
Reference. The tree inside it is just an implementation detail. I was
thinking that you could replace the whole tree with DFoR, but I suppose
you cannot do random lookups in a DFoR compressed list, so you'd still
need the tree.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-04-21 07:24:25 Re: Adding REPACK [concurrently]
Previous Message Chao Li 2026-04-21 07:15:59 Re: logical: fix recomputation required LSN on restart_lsn-only advancement