Re: How to implement Aggregate Awareness?

From: Olgierd Michalak <olgierdm(at)softcomputer(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: "pgsql-sql(at)postgresql(dot)org" <pgsql-sql(at)postgresql(dot)org>
Subject: Re: How to implement Aggregate Awareness?
Date: 2011-11-07 19:25:07
Message-ID: F8991549C624E2429C578E86D68A4D027F42AB07@EXCH-MBX01.softcomputer.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> Simply put, when Transparent (to the reporting tool) Aggregate Navigator
> recognizes that a query would execute faster using aggregates, it
> automatically rewrites the query so that the database hits the smaller
> aggregates rather than larger detail tables upon which the small
> aggregates are pre-computed.

Assuming I understand what you want correctly: Before you could usefully
do anything like that, I think you'd need to implement full support for
materialized views.

Once that's in place, the optimizer could potentially be enhanced to
recognise queries against a base table and rewrite it to use a
materialized view of the table when it sees that the query only touches
data collected by the materialized view.

Right now, there isn't really anything for query rewriting like this to
/target/ .

--
Craig Ringer

------------------------------------------------------------------------------

For Aggregate Navigation, materialized views are not required. But, let me give you a short example of how it could work. Let us assume we have a fact table SALES with a single measure SALES.AMOUNT, and one dimension table DATE with attributes: DATE.YEAR, DATE.MONTH, DATE.DAY. Assume we also have an aggregate table AGG_SALES that stores aggregated sales amounts at the month level in the field AGG_SALES.AMOUNT. The corresponding shrunken dimension DATE2MM has a subset of DATE's attributes, that is: DATE2MM.YEAR and DATE2MM.MONTH. Now, assume a reporting tool (oblivious of the existence of both AGG_SALES and DATE2MM) sends this query to PostgreSQL:

Query 1.
select d.month, sum(s.amount)
from sales s, date d
where s.date_key = d.date_key
group by d.month

The Aggregate Navigator recognizes that such query will execute faster if aggregate table is used. So it substitutes the aggregate tables into the original query, so in effect the query is re-written as:

Query 2.
select d.month, sum(s.amount)
from agg_sales s, date2mm d
where s.date_key = d.date_key
group by d.month

On the other hand, if the following query arrives:

Query 3.
select d.day, sum(s.amount)
from sales s, date d
where s.date_key = d.date_key
group by d.day

The Aggregate Navigator will not be able to find DATE2MM.DAY, so no substitution will take place.

Of course, there needs to be some kind of meta-data defining the relationship between AGG_SALES and SALES that should help the Aggregate Navigator quickly choose the fastest query. In this case, I imagine, the pseudo code could look like this.

CREATE RULE "_RETURN"
AS ON SELECT TO SALES DO INSTEAD
{ LANGUAGE PSEUDOCODE $$
IF all the query fields are found in aggregate tables THEN
Execute the query substituting the aggregate table names into the query, i.e.: AGG_SALES for SALES
ELSE
Execute the query unchanged, i.e.: fall back on the detail tables where all the fields are available.
$$ }

Transparent Aggregate Navigation is simple in theory, but can it be implemented in PostgreSQL?

Thank you all for all your comments and suggestions,

---
Olgierd Michalak
Soft Computer Consultants, Inc.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Little, Douglas 2011-11-07 20:03:54 Re: How to implement Aggregate Awareness?
Previous Message Richard Huxton 2011-11-07 08:44:54 Re: the use of $$string$$