From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Rename dead_tuples to dead_items in vacuumlazy.c |
Date: | 2021-12-01 05:34:39 |
Message-ID: | CAD21AoCYdn9n+-ZBD_WyJq-4Ws=E6rErcNDsB_1DTfDJ-DzwDw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 1, 2021 at 4:42 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Mon, Nov 29, 2021 at 7:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Thanks! I'll change my parallel vacuum refactoring patch accordingly.
>
> Thanks again for working on that.
>
> > Regarding the commit, I think that there still is one place in
> > lazyvacuum.c where we can change "dead tuples” to "dead items”:
> >
> > /*
> > * Allocate the space for dead tuples. Note that this handles parallel
> > * VACUUM initialization as part of allocating shared memory space used
> > * for dead_items.
> > */
> > dead_items_alloc(vacrel, params->nworkers);
> > dead_items = vacrel->dead_items;
>
> Oops. Pushed a fixup for that just now.
Thanks!
>
> > Also, the commit doesn't change both PROGRESS_VACUUM_MAX_DEAD_TUPLES
> > and PROGRESS_VACUUM_NUM_DEAD_TUPLES. Did you leave them on purpose?
>
> That was deliberate.
>
> It would be a bit strange to alter these constants without also
> updating the corresponding column names for the
> pg_stat_progress_vacuum system view. But if I kept the definition from
> system_views.sql in sync, then I would break user scripts -- for
> reasons that users don't care about. That didn't seem like the right
> approach.
Agreed.
>
> Also, the system as a whole still assumes "DEAD tuples and LP_DEAD
> items are the same, and are just as much of a problem in the table as
> they are in each index". As you know, this is not really true, which
> is an important problem for us. Fixing it (perhaps as part of adding
> something like Robert's conveyor belt design) will likely require
> revising this model quite fundamentally (e.g, the vacthresh
> calculation in autovacuum.c:relation_needs_vacanalyze() would be
> replaced). When this happens, we'll probably need to update system
> views that have columns with names like "dead_tuples" -- because maybe
> we no longer specifically count dead items/tuples at all. I strongly
> suspect that the approach to statistics that we take for pg_statistic
> optimizer stats just doesn't work for dead items/tuples -- statistical
> sampling only produces useful statistics for the optimizer because
> certain delicate assumptions are met (even these assumptions only
> really work with a properly normalized database schema).
>
> Maybe revising the model used for autovacuum scheduling wouldn't
> include changing pg_stat_progress_vacuum, since that isn't technically
> "part of the model" --- I'm not sure. But it's not something that I am
> in a hurry to fix.
Understood.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2021-12-01 05:46:45 | Re: pg_replslotdata - a tool for displaying replication slot information |
Previous Message | Masahiko Sawada | 2021-12-01 05:23:29 | Re: Skipping logical replication transactions on subscriber side |