Re: Stats for inheritance trees

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Stats for inheritance trees
Date: 2009-12-29 10:31:24
Message-ID: 1262082684.19367.1960.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2009-12-28 at 17:41 -0500, Tom Lane wrote:

> Following up on the discussion here
> http://archives.postgresql.org/message-id/4B3875C6020000250002D97D@gw.wicourts.gov
> I'd like to propose making the following changes that would allow saner
> planning for queries involving inheritance:

Sounds good.

> 1. Currently the primary key of pg_statistic is (starelid, staattnum)
> indicating the table and column the stats entry is for. I propose
> adding a bool stainherit to the pkey. "false" means the stats entry
> is for just that table column, ie, the traditional interpretation.
> "true" means the stats entry covers that column and all its inheritance
> children. Such entries could be used directly by the planner in cases
> where it currently punts and delivers a default estimate.
>
> 2. When ANALYZE is invoked on a table that has inheritance children,
> it will perform its normal duties for just that table (creating or
> updating entries with stainherit = false) and then perform a second
> scan that covers that table and all its children. This will be used
> to create or update entries with stainherit = true. It might be
> possible to avoid scanning the parent table itself twice, but I won't
> contort the code too much to avoid that, since in most practical
> applications the parent is empty or small anyway.

Will there be logic to decide how stainherit should be set? Many columns
in an inherited set have similar values in different children, e.g.
OrderValue, OrderStatus but many columns also have very different values
in different children. e.g. OrderId, OrderPlacementDate,
OrderFulfillmentDate

> 3. Ideally autovacuum would know enough to perform ANALYZEs on
> inheritance parents after enough churn has occurred in their child
> table(s). I am not entirely clear about a good way to do that.
> We could have it just directly force an ANALYZE on parent(s) of any
> table it has chosen to ANALYZE, but that might be overkill --- in
> particular leading to excess ANALYZEs when several children receive
> a lot of updates.
>
> Even without a really smart solution to #3, this would be a big step
> forward for inheritance queries.
>
> BTW, while at it I'm inclined to add a non-unique index on
> pg_inherits.inhparent, so that find_inheritance_children won't have to
> seqscan pg_inherits anymore. It's surprising people haven't complained
> about that before.

They have, we just haven't done anything about it.

> The code says
>
> * XXX might be a good idea to create an index on pg_inherits' inhparent
> * field, so that we can use an indexscan instead of sequential scan here.
> * However, in typical databases pg_inherits won't have enough entries to
> * justify an indexscan...
>
> but we've long since learned that people stress databases in odd ways.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-12-29 10:42:00 Re: New VACUUM FULL still needed?
Previous Message Tarun Sharma 2009-12-29 10:26:45 Can we hide data from the superadmin