Re: Compress prune/freeze records with Delta Frame of Reference algorithm

From: Evgeny Voropaev <evgeny(dot)voropaev(at)tantorlabs(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Date: 2026-04-08 12:34:42
Message-ID: 3fc9f40b-2c4f-49c7-bf1c-570c5b9e6e9e@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas, Andreus, Andrey, hello!

> A ~170kB patch really should present some numbers
> quantifying the expected benefit. It doesn't need to be a real workload
> from production, but something plausible enough. Even some basic
> back-of-the-envelope calculations might be enough to show the promise.

The patch results in reduction of WAL total size by:
81% during vacuuming a table having no index,
and by 55% during vacuuming a table having an index.

The numbers are the next:

=== VACUUM (table with no index) ===
-------------------- ----------------- ----------------- -----------
DFOR off, bytes DFOR on, bytes Reduction
-------------------- ----------------- ----------------- -----------
WAL total size 6743149 1184446 82%
Prune records size 6710185 1159723 5.8x
-------------------- ----------------- ----------------- -----------

=== VACUUM (table with index) ===
-------------------- ----------------- ----------------- -----------
DFOR off, bytes DFOR on, bytes Reduction
-------------------- ----------------- ----------------- -----------
WAL total size 20394208 8907090 56%
Prune records size 6812850 1225944 5.6x
-------------------- ----------------- ----------------- -----------

The logic of the tests is based on the technique from [1] and is the
next:

-- SQL
CREATE TABLE t_prune ( id int, val text )
WITH (fillfactor = 100, autovacuum_enabled = false);

INSERT INTO t_prune
SELECT g, 'x' FROM generate_series(1,3000000) g;

CREATE INDEX ON t_prune(id); -- for the test using an indexed table

DELETE FROM t_prune WHERE id % 500 <> 0;
SELECT pg_current_wal_flush_lsn(); -- get start_lsn here
VACUUM FREEZE t_prune; -- 3 times
SELECT pg_current_wal_flush_lsn(); -- get end_lsn here

# BASH
# stop cluster
pg_waldump -p $wal_dir -s $start_lsn -e $end_lsn 2>/dev/null;

The test is implemented in 052_prune_dfor_compression.pl, therefore
the presented results can be refetched by restarting this test script.

> Also, I find it somewhat unlikely we'd import a GPLv3 library like
> this, even if it's just a testing framework. Even ignoring the
> question of having a different license for some of the code, it'd mean
> maintenance burden (maybe libtap is stable/mature, no idea). I don't
> see why this would be better than "write a SQL callable test module".

I am ready to rework it once there is consensus on the core of the
patch.

Best regards,
Evgeny.

P.s. rebased onto a1643d40b30.

[1]
https://www.postgresql.org/message-id/flat/CAAKRu_ZMw6Npd_qm2KM%2BFwQ3cMOMx1Dh3VMhp8-V7SOLxdK9-g%40mail.gmail.com

Attachment Content-Type Size
v08-0001-Implement-vect-and-uniqsortvect-containers-and-b.patch text/x-patch 95.6 KB
v08-0002-Implement-Delta-Frame-of-Reference-compression.patch text/x-patch 42.1 KB
v08-0003-Use-Delta-Frame-of-Reference-DFoR-to-compress-pr.patch text/x-patch 35.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2026-04-08 12:50:11 Re: Buildfarm misses running some contrib TAP tests
Previous Message Amit Kapila 2026-04-08 11:54:52 Re: PG 19 release notes and authors