Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
Date: 2022-02-17 14:17:04
Message-ID: CA+Tgmoa+O3v5T+B6-BmyG13czpQ3Mzx6L5Tbrj_jVuTgSv=-Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 16, 2022 at 10:43 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> How can you be surprised that I committed 44fa8488? It's essentially
> the same patch as the first version, posted November 22 -- almost 3
> months ago. And it's certainly not a big patch (though it is
> complicated).

Let's back up a minute and talk about the commit of $SUBJECT. The
commit message contains a Discussion link to this thread. This thread,
at the time you put that link in there, had exactly one post: from
you. That's not much of a discussion, although I do acknowledge that
sometimes we commit things that have bugs, and those bugs need to be
fixed even if nobody has responded.

That brings us to the question of whether 44fa8488 was improvidently
committed. I don't know the answer to that question, and here's why:

> The commit message is high level.

I would say it differently: I think the commit message does a poor job
describing what the commit actually does. For example, it says nothing
about changing VACUUM to always scan the last page of every heap
relation. This whole thread is about fixing a problem that was caused
by a significant behavior change that was *not even mentioned* in the
original commit message. If it had been mentioned, I likely would have
complained, because it's very similar to behavior that Tom eliminated
in b503da135ab0bdd97ac3d3f720c35854e084e525, which he did because it
was distorting reltuples estimates.

Commit messages need to describe what the commit actually changes.
Theoretical ideas are fine, but if I, as a committer who have done
significant work in this area in the past, can't read the commit
message and understand what is actually different, it's not a good
commit message. I think you *really* need to put more effort into
making your patches, and the emails about your patches, and the commit
messages for your patches understandable to other people. Otherwise,
waiting 3 months between when you post the patch and when you commit
it means nothing. You can wait 10 years to commit and still get
objections, if other people don't understand what you're doing.

I would guess that's really the root of Andres's concern here. I
believe that both Andres and I are in favor of the kinds of things you
want to do here *in principle*. But in practice I feel like it's not
working well, and thereby putting the project at risk. What if some
day one of us needs to fix a bug in your code? It's not like VACUUM is
some peripheral system where bugs aren't that critical -- and it's
also not the case that the risk of introducing new bugs is low.
Historically, it's anything but.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-02-17 14:20:56 Re: killing perl2host
Previous Message Imseih (AWS), Sami 2022-02-17 13:52:23 Re: Add index scan progress to pg_stat_progress_vacuum