Re: Proposal: Adding compression of temporary files

From: Filip Janus <fjanus(at)redhat(dot)com>
To: lakshmi <lakshmigcdac(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Adding compression of temporary files
Date: 2026-01-18 15:50:24
Message-ID: CAFjYY+JDSpOQwYAfTQQ43=BA=d32XfcAdaPVJgHheV9fQBbLWg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Thank you, Tomas, for the thorough and detailed review!
I'm posting an updated patch set incorporating the changes from your review.

Changes applied from review:
- Simplified BufFileCreateTemp interface
- Improved error handling in BufFileLoadBuffer/BufFileDumpBuffer
- Unified compression header format (CompressHeader struct)
- Added tuplestore integration (compression when EXEC_FLAG_BACKWARD is not
required)
- Various code cleanups and comment improvements
Additional change (not from review):
- Switched from static shared buffer to per-file allocation. The shared
buffer
provided a negligible performance benefit while keeping memory allocated
for the backend's lifetime.
Future work:
- Support for additional compression methods (gzip, zstd)
- Random access and seek operations with compression

-Filip-

út 13. 1. 2026 v 14:34 odesílatel Filip Janus <fjanus(at)redhat(dot)com> napsal:

> Hi,
> Yes, it needs to be rebased. I am working on it. I will post it here soon.
>
>
> -Filip-
>
>
> út 13. 1. 2026 v 13:51 odesílatel lakshmi <lakshmigcdac(at)gmail(dot)com> napsal:
>
>> Hi all,
>> I tried to replicate the temporary file compression issue by applying the
>> two patches shared in the thread on current PostgreSQL master.
>> here is what i observed,
>> 1) patch 1:0001-Add-transparent-compression-for-temporary-files.patch
>> when applying the first patch it ultimately fails to apply due to context
>> mismatches.
>>
>> failures i see are in the following files:
>> src/backend/storage/file/buffile.c
>> src/backend/utils/misc/guc_tables.c
>> src/backend/utils/misc/postgresql.conf.sample
>>
>> 2) The second
>> patch 0002-Add-regression-tests-for-temporary-file-compression.patch
>> ,applies successfully without any issues.
>>
>> Does it mean that the implementation patch needs to be rebased or
>> otherwise adjusted for the current codebase, and if so, what would be the
>> recommended way to proceed?could you please suggest how I should apply the
>> implementation patch in this case?
>>
>>
>> regards
>> lakshmi
>>
>> On Tue, Jan 13, 2026 at 5:01 PM Filip Janus <fjanus(at)redhat(dot)com> wrote:
>>
>>> Rebase after changes introduced in guc_tables.c
>>>
>>> -Filip-
>>>
>>>
>>> út 19. 8. 2025 v 17:48 odesílatel Filip Janus <fjanus(at)redhat(dot)com>
>>> napsal:
>>>
>>>> Fix overlooked compiler warnings
>>>>
>>>> -Filip-
>>>>
>>>>
>>>> po 18. 8. 2025 v 18:51 odesílatel Filip Janus <fjanus(at)redhat(dot)com>
>>>> napsal:
>>>>
>>>>> I rebased the proposal and fixed the problem causing those problems.
>>>>>
>>>>> -Filip-
>>>>>
>>>>>
>>>>> út 17. 6. 2025 v 16:49 odesílatel Andres Freund <andres(at)anarazel(dot)de>
>>>>> napsal:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 2025-04-25 23:54:00 +0200, Filip Janus wrote:
>>>>>> > The latest rebase.
>>>>>>
>>>>>> This often seems to fail during tests:
>>>>>> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5382
>>>>>>
>>>>>> E.g.
>>>>>>
>>>>>> https://api.cirrus-ci.com/v1/artifact/task/4667337632120832/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress
>>>>>>
>>>>>> === dumping
>>>>>> /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/regression.diffs
>>>>>> ===
>>>>>> diff -U3
>>>>>> /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
>>>>>> /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
>>>>>> ---
>>>>>> /tmp/cirrus-ci-build/src/test/regress/expected/join_hash_pglz.out
>>>>>> 2025-05-26 05:04:40.686524215 +0000
>>>>>> +++
>>>>>> /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/join_hash_pglz.out
>>>>>> 2025-05-26 05:15:00.534907680 +0000
>>>>>> @@ -594,11 +594,8 @@
>>>>>> select count(*) from join_foo
>>>>>> left join (select b1.id, b1.t from join_bar b1 join join_bar b2
>>>>>> using (id)) ss
>>>>>> on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
>>>>>> - count
>>>>>> --------
>>>>>> - 3
>>>>>> -(1 row)
>>>>>> -
>>>>>> +ERROR: could not read from temporary file: read only 8180 of
>>>>>> 1572860 bytes
>>>>>> +CONTEXT: parallel worker
>>>>>> select final > 1 as multibatch
>>>>>> from hash_join_batches(
>>>>>> $$
>>>>>> @@ -606,11 +603,7 @@
>>>>>> left join (select b1.id, b1.t from join_bar b1 join join_bar b2
>>>>>> using (id)) ss
>>>>>> on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1;
>>>>>> $$);
>>>>>> - multibatch
>>>>>> -------------
>>>>>> - t
>>>>>> -(1 row)
>>>>>> -
>>>>>> +ERROR: current transaction is aborted, commands ignored until end
>>>>>> of transaction block
>>>>>> rollback to settings;
>>>>>> -- single-batch with rescan, parallel-oblivious
>>>>>> savepoint settings;
>>>>>>
>>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> Andres
>>>>>>
>>>>>>
>>>>>>

Attachment Content-Type Size
0002-Add-regression-tests-for-temporary-file-compression.patch application/octet-stream 127.6 KB
0001-Add-transparent-compression-for-temporary-files.patch application/octet-stream 18.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Sami Imseih 2026-01-18 16:16:16 Re: Cleaning up PREPARE query strings?
Previous Message Henson Choi 2026-01-18 15:32:40 Re: Row pattern recognition