| From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
|---|---|
| To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
| Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com> |
| Subject: | Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin |
| Date: | 2025-06-20 14:45:29 |
| Message-ID: | CAAKRu_Z-qjOgUjJqG_ScQFF_E9aEm1+uWPUJKXPwJ0QC27pKOQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jun 18, 2025 at 12:33 AM John Naylor <johncnaylorls(at)gmail(dot)com>
wrote:
> Here's what numbers I'm looking at as well:
>
> *** normal
> meson options: -Dcassert=true -Ddebug=true -Dc_args='-Og'
> -Dc_args='-fno-omit-frame-pointer'
>
> $ meson test -q --print-errorlogs --suite setup && meson test -q
> --print-errorlogs recovery/048_vacuum_horizon_floor
>
> $ grep finished
>
> build-debug/testrun/recovery/048_vacuum_horizon_floor/log/048_vacuum_horizon_floor_primary.log
>
> 2025-06-18 10:08:19.072 +07 [5730] 048_vacuum_horizon_floor.pl INFO:
> finished vacuuming "test_db.public.vac_horizon_floor_table": index
> scans: 12
>
Thanks for providing this! I actually did see the same number of index
scans as you with 9000 rows.
So, I think I figured out why I was seeing the test hang waiting for
pg_stat_progress_vacuum to report an index count.
Using auto_explain, I determined that the cursor was using an index-only
scan with lower row counts. That meant it pinned an index leaf page instead
of a heap page and the first round of index vacuuming couldn't complete
because btree index vacuuming requires we acquire a cleanup lock on every
leaf page.
I solved this by disabling all index scans in the cursor's session.
I attached the updated patch which passes for me on 32 and 64-bit builds.
We've managed to reduce the row count so low (1000-2000 rows) that I'm not
sure it matters if we have a 64-bit and 32-bit case. However, since we have
the large block comment about the required number of rows, I figured we
might as well have the two different nrows.
I'll have to do some more research on 14-16 to see if this could be a
problem there.
I also disabled prefetching, concurrent IO, and read combining for vacuum
-- it didn't cause a problem in my local tests, but I could see it
interfering with the test and potentially causing flakes/failures on some
machines/configurations. That means I'll have to do a slightly different
patch for 17 than 18 (17 doesn't have io_combine_limit).
Finally, I disabled parallelism as a future-proofing for having heap vacuum
parallelism -- wouldn't want a mysterious failure in this test in the
future.
- Melanie
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Test-that-vacuum-removes-tuples-older-than-Oldest.patch | text/x-patch | 13.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Junwang Zhao | 2025-06-20 14:45:44 | Re: Fixes inconsistent behavior in vacuum when it processes multiple relations |
| Previous Message | David G. Johnston | 2025-06-20 14:41:42 | Re: Huge commitfest app update upcoming: Tags, Draft CF, Help page, and automated commitfest creat/open/close |