Re: Compress prune/freeze records with Delta Frame of Reference algorithm

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Evgeny Voropaev <evgeny(dot)voropaev(at)tantorlabs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Date: 2026-03-29 12:16:47
Message-ID: 5a2f3df2-a736-4ada-8aa3-aa6e20b2e067@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/24/26 15:28, Evgeny Voropaev wrote:
> Hello Andres,
>
>> I'm unconvinced that this is a serious problem - typically the
>> overhead of WAL
>> volume due to pruning / freezing is due to the full page images
>> emitted, not
>> the raw size of the records. Once an FPI is emitted, this doesn't matter.
>>
>> What gains have you measured in somewhat realistic workloads?
>
> So far, we have had no tests in any real production environment.
> Moreover, the load in the new test (recovery/
> t/052_prune_dfor_compression.pl) is quite contrived. However, it
> demonstrates a compression ratio of more than 5, and it is measured for
> an overall size of all prune/freeze records with no filtering.
>
> Further development is the implementation of compression of unsorted
> sequences. This is going to allow PostgreSQL to compress also the
> 'frozen' and the 'redirected' offset sequences, which should result in a
> greater compression ratio.
>
> But I agree with you, Andres, we need practical results to estimate a
> profit. I wish we would test it on some real load soon.
>
> Also I hope, independently of its usage in prune/freeze records, the
> DFoR itself might be used for compression sequences in other places of PG.
>

IMHO Andres is right. A ~170kB patch really should present some numbers
quantifying the expected benefit. It doesn't need to be a real workload
from production, but something plausible enough. Even some basic
back-of-the-envelope calculations might be enough to show the promise.

Without this, the cost/benefit is so unclear most senior contributors
will probably review something else. You need to make the case why this
is worth it.

I only quickly skimmed the patches, for exactly this reason. I'm a bit
confused why this needs to add the whole libtap thing in 0001, instead
of just testing this through the SQL interface (same as test_aio etc.).

Also, I find it somewhat unlikely we'd import a GPLv3 library like this,
even if it's just a testing framework. Even ignoring the question of
having a different license for some of the code, it'd mean maintenance
burden (maybe libtap is stable/mature, no idea). I don't see why this
would be better than "write a SQL callable test module".

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniil Davydov 2026-03-29 12:26:11 Get rid of redundant StringInfo accumulation
Previous Message Henson Choi 2026-03-29 12:13:47 Re: Row pattern recognition