Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date: 2025-10-20 21:12:35
Message-ID: 9qn05q5r-s53q-8o3s-0313-p1s94r64oq9r@tzk.arg
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday 2025-10-15 21:21, Tom Lane wrote:

>> 0004 increases the row width in the existing test case that says
>> it's trying to push more than DEFAULT_IO_BUFFER_SIZE through
>> the compressors. While I agree with the premise, this solution
>> is hugely expensive: it adds about 12% to the already-long runtime
>> of 002_pg_dump.pl. I'd like to find a better way, but ran out of
>> energy for today. (I think the reason this costs so much is that
>> it's effectively iterated hundreds of times because of
>> 002_pg_dump.pl's more or less cross-product approach to testing
>> everything. Maybe we should pull it out of that structure?)
>
> The attached patchset accomplishes that by splitting 002_pg_dump.pl
> into two scripts, one that is just concerned with the compression
> test cases and one that does everything else. This might not be
> the prettiest solution, since it duplicates a lot of perl code.
> I thought about refactoring 002_pg_dump.pl so that it could handle
> two separate sets of runs-plus-tests, but decided it was overly
> complicated already.
>
> Anyway, 0001 attached is the same as in v4, 0002 performs the
> test split without intending to change coverage, and then 0003
> adds the new test cases I wanted. For me, this ends up with
> just about the same runtime as before, or maybe a smidge less.
> I'd hoped for possibly more savings than that, but I'm content
> with it being a wash.
>
> I think this is more or less committable, and then we could get
> back to the original question of whether it's worth tweaking
> pg_restore's seek-vs-scan behavior.

Hi Tom, since you are dealing with pg_restore testing, you might want to
have a look in the 2nd patch from here:

https://www.postgresql.org/message-id/413c1cd8-1d6d-90ba-ac7b-b226a4dad5ed%40gmx.net

Direct link to the patch is:

https://www.postgresql.org/message-id/attachment/177661/v3-0002-Add-new-test-file-with-pg_restore-test-cases.patch

It's a much shorter test, focused on pg_restore.

1. It generates two custom-format dumps (with-TOC and TOC-less).

2. Restores each dump to an empty database using pg_restore with
a couple of switches combinations
(one combination (--clean --data-only will not work without a patch
of mine so we might want to remove that and enrich with others).

3. Tests pg_restore over pre-existing database

4. Tests pg_restore reading file from stdin.

Regards,
Dimitris

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2025-10-20 21:18:33 Re: abi-compliance-check failure due to recent changes to pg_{clear,restore}_{attribute,relation}_stats()
Previous Message Nazir Bilal Yavuz 2025-10-20 21:09:27 Re: Speed up COPY FROM text/CSV parsing using SIMD