Re: Row estimates for empty tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, postgre <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Row estimates for empty tables
Date: 2020-07-24 21:09:33
Message-ID: 578094.1595624973@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> pá 24. 7. 2020 v 16:38 odesílatel Christophe Pettus <xof(at)thebuild(dot)com>
> napsal:
>> Since we already special-case parent tables for partition sets, would a
>> storage parameter that lets you either tell the planner "no, really, zero
>> is reasonable here" or sets a minimum number of rows to plan for be
>> reasonable?

> It is an issue for special, not typical applications (this situation is
> typical for some OLAP patterns) - it is not too often - but some clean
> solution (instead hacking postgres) can be nice.

The core issue here is "how do we know whether the table is likely to stay
empty?". I can think of a couple of more or less klugy solutions:

1. Arrange to send out a relcache inval when adding the first page to
a table, and then remove the planner hack for disbelieving relpages = 0.
I fear this'd be a mess from a system structural standpoint, but it might
work fairly transparently.

2. Establish the convention that vacuuming or analyzing an empty table
is what you do to tell the system that this state is going to persist.
That's more or less what the existing comments in plancat.c envision,
but we never made a definition for how the occurrence of that event
would be recorded in the catalogs, other than setting relpages > 0.
Rather than adding another pg_class column, I'm tempted to say that
vacuum/analyze should set relpages to a minimum of 1, even if the
relation has zero pages. That does get the job done:

regression=# create table foo(f1 text);
CREATE TABLE
regression=# explain select * from foo;
QUERY PLAN
--------------------------------------------------------
Seq Scan on foo (cost=0.00..23.60 rows=1360 width=32)
(1 row)

regression=# vacuum foo; -- doesn't help
VACUUM
regression=# explain select * from foo;
QUERY PLAN
--------------------------------------------------------
Seq Scan on foo (cost=0.00..23.60 rows=1360 width=32)
(1 row)
regression=# update pg_class set relpages = 1 where relname = 'foo';
UPDATE 1
regression=# explain select * from foo;
QUERY PLAN
----------------------------------------------------
Seq Scan on foo (cost=0.00..0.00 rows=1 width=32)
(1 row)

(We're still estimating one row, but that's as a result of different
decisions that I'm not nearly as willing to compromise on...)

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ted Toth 2020-07-24 21:12:03 Re: when is RLS policy applied
Previous Message Scott Ribe 2020-07-24 20:46:06 bad JIT decision

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2020-07-24 21:15:15 Re: Improving connection scalability: GetSnapshotData()
Previous Message Peter Geoghegan 2020-07-24 19:55:01 Re: Default setting for enable_hashagg_disk