| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Evgeny Voropaev <evgeny(dot)voropaev(at)tantorlabs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
| Subject: | Re: Compress prune/freeze records with Delta Frame of Reference algorithm |
| Date: | 2026-03-29 12:16:47 |
| Message-ID: | 5a2f3df2-a736-4ada-8aa3-aa6e20b2e067@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 3/24/26 15:28, Evgeny Voropaev wrote:
> Hello Andres,
>
>> I'm unconvinced that this is a serious problem - typically the
>> overhead of WAL
>> volume due to pruning / freezing is due to the full page images
>> emitted, not
>> the raw size of the records. Once an FPI is emitted, this doesn't matter.
>>
>> What gains have you measured in somewhat realistic workloads?
>
> So far, we have had no tests in any real production environment.
> Moreover, the load in the new test (recovery/
> t/052_prune_dfor_compression.pl) is quite contrived. However, it
> demonstrates a compression ratio of more than 5, and it is measured for
> an overall size of all prune/freeze records with no filtering.
>
> Further development is the implementation of compression of unsorted
> sequences. This is going to allow PostgreSQL to compress also the
> 'frozen' and the 'redirected' offset sequences, which should result in a
> greater compression ratio.
>
> But I agree with you, Andres, we need practical results to estimate a
> profit. I wish we would test it on some real load soon.
>
> Also I hope, independently of its usage in prune/freeze records, the
> DFoR itself might be used for compression sequences in other places of PG.
>
IMHO Andres is right. A ~170kB patch really should present some numbers
quantifying the expected benefit. It doesn't need to be a real workload
from production, but something plausible enough. Even some basic
back-of-the-envelope calculations might be enough to show the promise.
Without this, the cost/benefit is so unclear most senior contributors
will probably review something else. You need to make the case why this
is worth it.
I only quickly skimmed the patches, for exactly this reason. I'm a bit
confused why this needs to add the whole libtap thing in 0001, instead
of just testing this through the SQL interface (same as test_aio etc.).
Also, I find it somewhat unlikely we'd import a GPLv3 library like this,
even if it's just a testing framework. Even ignoring the question of
having a different license for some of the code, it'd mean maintenance
burden (maybe libtap is stable/mature, no idea). I don't see why this
would be better than "write a SQL callable test module".
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniil Davydov | 2026-03-29 12:26:11 | Get rid of redundant StringInfo accumulation |
| Previous Message | Henson Choi | 2026-03-29 12:13:47 | Re: Row pattern recognition |