Re: No hash join across partitioned tables?

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Kris Jurka <books(at)ejurka(dot)com>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: No hash join across partitioned tables?
Date: 2010-10-16 05:35:46
Message-ID: AANLkTim1+h8jgOV6iEoN8OxF19MTb+wHFDMP8_tKdONp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Oct 15, 2010 at 10:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > If we analyze the parent, do we also update the children stats, or is it
> > just that we keep two stats for the parent, one with children and one
> > without, both being updated when the parent is analyzed?
>
> The latter.
>
> The trick here is that we need to fire an analyze on the parent even
> though only its children may have had any updates.
>
> > If the latter's the case, maybe we should modify ANALYZE a bit more, so
> > that we can analyze the whole hierarchy in one go, and store the lot of
> > stats with a single pass (each child alone, the parent alone, the parent
> > plus children). However it's not real clear how would this work with
> > multiple inheritance levels.
>

An issue with automatically analyzing the entire hierarchy is 'abstract'
table definitions. I've got a set of tables for storing the same data at
different granularities of aggregation. Within each granularity, I've got
partitions, but because the set of columns is identical for each
granularity, I've got an abstract table definition that is inherited by
everything. I don't need or want statistics kept on that table because I
never query across the abstract table, only the parent table of each
aggregation granularity

create table abstract_fact_table (
time timestamp,
measure1 bigint,
measure2 bigint,
measure3 bigint,
fk1 bigint,
fk2 bigint
);

create table minute_scale_fact_table (
} inherits abstract_fact_table;

// Then there are several partitions for minute scale data

create table hour_scale_fact_table (
) inherits abstract_fact_table;

// then several partitions for hour scale data

etc. I do run queries on the minute_scale_fact_table and
hour_scale_fact_table but never do so on abstract_fact_table. I could
certainly modify my schema such that the abstract table goes away entirely
easily enough, but I find this easier for new developers to come in and
comprehend, since the similarity between the table definitions is explicit.

I'm glad this topic came up, as I was unaware that I need to run analyze on
the parent partitions separately - and no data is every inserted directly
into the top level of each granularity hierarchy, so it will never fire by
itself.

If I am using ORM and I've got functionality in a common baseclass in the
source code, I'll often implement its mapping in the database via a parent
table that the table for any subclass mapping can inherit from. Again, I
have no interest in maintaining statistics on the parent table, since I
never query against it directly.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2010-10-16 07:27:46 Is LISTEN/NOTIFY reliable?
Previous Message Tom Lane 2010-10-16 05:22:42 Re: No hash join across partitioned tables?

Browse pgsql-performance by date

  From Date Subject
Next Message Eric Comeau 2010-10-16 12:32:11 Help with duration of statement: EXECUTE <unnamed> [PREPARE: COMMIT]
Previous Message Tom Lane 2010-10-16 05:22:42 Re: No hash join across partitioned tables?