Re: Row estimates for empty tables

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Christophe Pettus <xof(at)thebuild(dot)com>
Subject: Re: Row estimates for empty tables
Date: 2020-07-25 00:37:19
Message-ID: CAApHDvrJEUpnDDJ-MAqJRFg1twdriT2y_ANRMDUitg+2d4CcaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Sat, 25 Jul 2020 at 10:34, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
> > 1. Arrange to send out a relcache inval when adding the first page to
> > a table, and then remove the planner hack for disbelieving relpages = 0.
> > I fear this'd be a mess from a system structural standpoint, but it might
> > work fairly transparently.
>
> I experimented with doing this. It's not hard to code, if you don't mind
> having RelationGetBufferForTuple calling CacheInvalidateRelcache. I'm not
> sure whether that code path might cause any long-term problems, but it
> seems to work OK right now. However, this solution causes massive
> "failures" in the regression tests as a result of plans changing. I'm
> sure that's partly because we use so many small tables in the tests.
> Nonetheless, it's not promising from the standpoint of not causing
> unexpected problems in the real world.

I guess all these changes would be the planner moving towards a plan
that suits having fewer rows for the given table better. If so, that
does seem quite scary as we already have enough problems from the
planner choosing poor plans when it thinks there are fewer rows than
there actually are. Don't we need to keep something like the 10-page
estimate there so safer plans are produced before auto-vacuum gets in
and gathers some proper stats?

I think if anything we'd want to move in the direction of producing
more cautious plans when the estimated number of rows is low. Perhaps
especially so for when the planner opts to do things like perform a
non-parameterized nested loop join when it thinks the RelOptInfo with,
say 3, unbeknown-to-the-planner, correlated, base restrict quals that
are thought to produce just 1 row, but actually produce many more.

> > 2. Establish the convention that vacuuming or analyzing an empty table
> > is what you do to tell the system that this state is going to persist.
> > That's more or less what the existing comments in plancat.c envision,
> > but we never made a definition for how the occurrence of that event
> > would be recorded in the catalogs, other than setting relpages > 0.
> > Rather than adding another pg_class column, I'm tempted to say that
> > vacuum/analyze should set relpages to a minimum of 1, even if the
> > relation has zero pages.
>
> I also tried this, and it seems a lot more promising: no existing
> regression test cases change. So perhaps we should do the attached
> or something like it.

This sounds like a more plausible solution. At least this way there's
an escape hatch for people who suffer due to this.

David

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2020-07-25 01:51:23 Re: bad JIT decision
Previous Message Andres Freund 2020-07-25 00:16:05 Re: bad JIT decision

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-07-25 00:41:33 Re: Include access method in listTables output
Previous Message Andres Freund 2020-07-25 00:00:32 Re: Improving connection scalability: GetSnapshotData()