Re: Rename dead_tuples to dead_items in vacuumlazy.c

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Rename dead_tuples to dead_items in vacuumlazy.c
Date: 2021-11-30 19:41:43
Message-ID: CAH2-WzkPRUkE6ad9u1zAseX-_aEHtnaQ8acr21C4Sb7vbeUK-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Thanks! I'll change my parallel vacuum refactoring patch accordingly.

Thanks again for working on that.

> Regarding the commit, I think that there still is one place in
> lazyvacuum.c where we can change "dead tuples” to "dead items”:
>
> /*
> * Allocate the space for dead tuples. Note that this handles parallel
> * VACUUM initialization as part of allocating shared memory space used
> * for dead_items.
> */
> dead_items_alloc(vacrel, params->nworkers);
> dead_items = vacrel->dead_items;

Oops. Pushed a fixup for that just now.

> Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
> and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?

That was deliberate.

It would be a bit strange to alter these constants without also
updating the corresponding column names for the
pg_stat_progress_vacuum system view. But if I kept the definition from
system_views.sql in sync, then I would break user scripts -- for
reasons that users don't care about. That didn't seem like the right
approach.

Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
items are the same, and are just as much of a problem in the table as
they are in each index". As you know, this is not really true, which
is an important problem for us. Fixing it (perhaps as part of adding
something like Robert's conveyor belt design) will likely require
revising this model quite fundamentally (e.g, the vacthresh
calculation in autovacuum.c:relation_needs_vacanalyze() would be
replaced). When this happens, we'll probably need to update system
views that have columns with names like "dead_tuples" -- because maybe
we no longer specifically count dead items/tuples at all. I strongly
suspect that the approach to statistics that we take for pg_statistic
optimizer stats just doesn't work for dead items/tuples -- statistical
sampling only produces useful statistics for the optimizer because
certain delicate assumptions are met (even these assumptions only
really work with a properly normalized database schema).

Maybe revising the model used for autovacuum scheduling wouldn't
include changing pg_stat_progress_vacuum, since that isn't technically
"part of the model" --- I'm not sure. But it's not something that I am
in a hurry to fix.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-11-30 19:52:28 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Robert Haas 2021-11-30 19:33:25 Re: Can I assume relation would not be invalid during from ExecutorRun to ExecutorEnd