Re: Why not represent "never vacuumed" accurately wrt pg_class.relpages?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why not represent "never vacuumed" accurately wrt pg_class.relpages?
Date: 2018-12-11 18:23:31
Message-ID: 20181211182331.uosjnfvncq3o7rxx@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-12-11 09:47:38 -0500, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > I don't quite get why we don't instead just represent "never vacuumed"
> > by storing a more meaningful value in relpages?
>
> Mostly, not wanting to break clients that look at these fields.
> If catalog compatibility weren't a concern, I'd seriously consider
> replacing both of them with a float "average tuples per page" ratio.

Yea, that'd be better. I'm not sure I believe this is a grave concern,
but if we really are concerned about not breaking clients, we could just
add a separate bool for the fact the table has been vacuumed.

> > I've seen numerous cases where relpages = 0 -> never vacuumed has caused
> > worse plans, and it just doesn't seem necessary?
>
> Worse plans than what?

The plans if the stats were believed.

> And why do you blame it on this representation? We don't believe that
> relpages is the actual size of the table.

No, but we assume that there's 10 pages. Even if both relpages and the
actual relation stats say there's not. And we assume there's as many
tuples on the page as can fit on it, using get_rel_data_width(). So if
you have a small table with a handful of entries at most, you suddenly
get estimates of a few hundred to ~a thousand rows.

I'd one client once insert a row into a lock table to make sense, just
to make sure it never gets vacuumed while there were no locks.

CREATE TABLE locks(lockid int);
VACUUM locks;
EXPLAIN SELECT * FROM locks;
┌─────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────┤
│ Seq Scan on locks (cost=0.00..35.50 rows=2550 width=4) │
└─────────────────────────────────────────────────────────┘
(1 row)
INSERT INTO locks VALUES (1);
VACUUM locks;
EXPLAIN SELECT * FROM locks;
┌─────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────┤
│ Seq Scan on locks (cost=0.00..1.01 rows=1 width=4) │
└─────────────────────────────────────────────────────┘
(1 row)
DELETE FROM locks;
VACUUM locks ;
EXPLAIN SELECT * FROM locks;
┌─────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├─────────────────────────────────────────────────────────┤
│ Seq Scan on locks (cost=0.00..35.50 rows=2550 width=4) │
└─────────────────────────────────────────────────────────┘
(1 row)

Estimates on a table constantly varying by three orders of magnitude -
not hard to imagine large plan changes due to that.

That's not what I was complaining about, but isn't it fairly broken that
the "actually empty relation" case here doesn't get hit if relpages ==
0? We clamp curpages to 10 before returning for the the empty size:

if (curpages < 10 &&
rel->rd_rel->relpages == 0 &&
!rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind != RELKIND_INDEX)
curpages = 10;

/* report estimated # pages */
*pages = curpages;
/* quick exit if rel is clearly empty */
if (curpages == 0)
{
*tuples = 0;
*allvisfrac = 0;
break;
}

I guess you could argue that the relation would potentially not be be
empty anymore by the time the plan is executed, but if that were part of
the logic it a) wouldn't just be relevant if relpages = 0, and b) should
be documented.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-11 18:43:47 Re: Why not represent "never vacuumed" accurately wrt pg_class.relpages?
Previous Message Andreas Karlsson 2018-12-11 17:18:24 Re: Introducing SNI in TLS handshake for SSL connections