Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables?

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Chris Browne" <cbbrowne(at)acm(dot)org>
Cc: "Pgsql-Performance(at)Postgresql(dot) Org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables?
Date: 2004-09-15 20:54:22
Message-ID: NOEFLCFHBPDAFHEIPGBOAEJACEAA.simon@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Chris Browne <cbbrowne(at)acm(dot)org> wrote on 15.09.2004, 04:34:53:
> simon(at)2ndquadrant(dot)com ("Simon Riggs") writes:
> > Well, its fairly straightforward to auto-generate the UNION ALL view,
and
> > important as well, since it needs to be re-specified each time a new
> > partition is loaded or an old one is cleared down. The main point is
that
> > the constant placed in front of each table must in some way relate to
the
> > data, to make it useful in querying. If it is just a unique constant,
chosen
> > at random, it won't do much for partition elimination. So, that tends to
> > make the creation of the UNION ALL view an application/data specific
thing.
>
> Ah, that's probably a good thought.
>
> When we used big "UNION ALL" views, it was with logging tables, where
> there wasn't really any meaningful distinction between partitions.
>
> So you say that if the VIEW contains, within it, meaningful constraint
> information, that can get applied to chop out irrelevant bits?
>
> That suggests a way of resurrecting the idea...
>
> Might we set up the view as:
>
> create view combination_of_logs as
> select * from table_1 where txn_date between 'this' and 'that'
> union all
> select * from table_2 where txn_date between 'this2' and 'that2'
> union all
> select * from table_3 where txn_date between 'this3' and 'that3'
> union all
> select * from table_4 where txn_date between 'this4' and 'that4'
> union all
> ... ad infinitum
> union all
> select * from table_n where txn_date > 'start_of_partition_n';
>
> and expect that to help, as long as the query that hooks up to this
> has date constraints?
>

That way of phrasing the view can give you the right answer to the
query, but does not exclude partitions.

With the above way of doing things, you end up with predicate phrases of
the form ((PARTLIMITLO < partcol) AND (PARTLIMITHI > partcol) AND
(partcol > QUERYLIMITLO) AND (partcol < QUERYLIMITHI))
...if the values in capitals are constants, then this should evaluate to
a true or false value for each partition table. The optimizer doesn't
yet do this....

If you specify the view the way I showed, then the predicate query
becomes a comparison of constants, which DOES get evaluated prior to
full execution....you will see this as a "one time test: false" in the
EXPLAIN.

The way you've phrased the view is the more obvious way to phrase it,
and I'd spent a few days trying to work out how to solve the algebra
above in code....but that was wasted effort.

Anyway, if you use constants you can still specify ranges and betweens
and have them work... hence my example showed date-like integers - but
I don't think it just applies to one datatype.

Best Regards, Simon Riggs

Browse pgsql-performance by date

  From Date Subject
Next Message Joe Conway 2004-09-15 20:56:17 Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Previous Message Kevin Neufeld 2004-09-15 20:08:30 declared cursor uses slow plan