Join optimization for inheritance tables

From: Nedyalko Borisov <nedyalko(at)asterdata(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: Herodotos Herodotou <Herodotos(dot)Herodotou(at)asterdata(dot)com>, Eric Friedman <Eric(dot)Friedman(at)asterdata(dot)com>, John Cieslewicz <John(dot)Cieslewicz(at)asterdata(dot)com>, Dheeraj Pandey <Dheeraj(dot)Pandey(at)asterdata(dot)com>
Subject: Join optimization for inheritance tables
Date: 2009-06-26 17:08:58
Message-ID: 4A4500AA.3050206@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

We are working with Aster for the summer and we would like to bounce some ideas that we are having for some possible PostgreSQL extensions. In order to describe our ideas we will use the following example:

create table msg(
msg_id int,
msg text );

create table receiver(
msg_id int,
user_id int,
ts timestamp );

create table msg_100(
check ( 1 <= msg_id and msg_id < 100 )
) inherits (msg);

create table msg_200(
check ( 100 <= msg_id and msg_id < 200 )
) inherits (msg);

create table msg_300(
check ( 200 <= msg_id and msg_id < 300 )
) inherits (msg);

create table receiver_100(
check ( 1 <= msg_id and msg_id < 100 )
) inherits (receiver);

create table receiver_200(
check ( 100 <= msg_id and msg_id < 200 )
) inherits (receiver);

create table receiver_300(
check ( 200 <= msg_id and msg_id < 300 )
) inherits (receiver);

When we are issuing queries on one of the parent tables, like,

SELECT * FROM msg WHERE msg_id BETWEEN 50 AND 70;

PostgreSQL is smart enough to filter out child tables with check constraints that are refuted by the filter conditions. In this example, the optimizer will pick a plan that only considers the parent table 'msg' and one of the child tables 'msg_100':

Result
-> Append
-> Seq Scan on msg
Filter: ((msg_id >= 50) AND (msg_id <= 70))
-> Seq Scan on msg_100 msg
Filter: ((msg_id >= 50) AND (msg_id <= 70))

Plan costs are removed for simplicity of the presentation.

Now, if we issue a join query between the two parent tables, like,

SELECT * FROM msg m JOIN receiver r ON m.msg_id = r.msg_id;

the execution plan will be:

Merge Join
Merge Cond: (m.msg_id = r.msg_id)
-> Sort
Sort Key: m.msg_id
-> Append
-> Seq Scan on msg m
-> Seq Scan on msg_100 m
-> Seq Scan on msg_200 m
-> Seq Scan on msg_300 m
-> Sort
Sort Key: r.msg_id
-> Append
-> Seq Scan on receiver r
-> Seq Scan on receiver_100 r
-> Seq Scan on receiver_200 r
-> Seq Scan on receiver_300 r

During the planning phase, the optimizer treats an entire hierarchy as a single entity. Hence, it first considers the most efficient way to create the append paths for the two hierarchies, and then the best way to join them. However, there are some optimizations that are possible here, similar to the table filtering described above. In particular, instead of joining the two appends, we could "push down" the join to the child relations - that is, create pairwise joins between the children and then append the join results together.

Based on the check conditions of the children and the join predicate, it is possible to filter out joins that cannot produce any results. For example, joining 'msg_100' with 'receiver_300' is redundant since the check constraints of these two tables do not overlap. Tuples in 'msg_100' have 'msg_id' between 1 and 100, whereas tuples in 'receiver_300' have 'msg_id' between 200 and 300. Therefore, no tuples can be produce from this join.

A plan with such optimizations could be:

Result
-> Append
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver_100 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver_200 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg msg
-> Hash
-> Seq Scan on receiver_300 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_100 msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_100 msg
-> Hash
-> Seq Scan on receiver_100 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_200 msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_200 msg
-> Hash
-> Seq Scan on receiver_200 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_300 msg
-> Hash
-> Seq Scan on receiver receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_300 msg
-> Hash
-> Seq Scan on receiver_300 receiver

The parent tables don't have any check constraints in the above example and hence will need to join with all the child relations. However, based on usage and feedback among Aster's customers, we believe that it is common practice not to store any data in the parent tables. Therefore, we were thinking of creating an "Empty Check Constraint". This constraint will enforce that a particular table is empty and will prevent the insertion of any rows into it. If the parent tables have the empty check constraint, then the optimizer will be able to further eliminate some of the child joins.

A plan with all optimizations and features suggested could be:

Result
-> Append
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_100 msg
-> Hash
-> Seq Scan on receiver_100 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_200 msg
-> Hash
-> Seq Scan on receiver_200 receiver
-> Hash Join
Hash Cond: (msg.msg_id = receiver.msg_id)
-> Seq Scan on msg_300 msg
-> Hash
-> Seq Scan on receiver_300 receiver

In summary, we are making two suggestions:
1. Extend the optimizer to consider joins between child tables when hierarchies are joined together.
2. Add the "Empty Check Constraint", which would enforce that a particular table is to remain empty.

We've created some basic implementation, just to verify the benefits of these ideas. The initial results seem promising as we observed decreased execution time and for large data set the query execution time could be several times faster than before.
We would greatly appreciate your comments, thoughts or suggestions that you might have regarding these ideas.

Regards,
Nedyalko Borisov and Herodotos Herodotou

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-06-26 20:13:16 Re: 8.4 open item: copy performance regression?
Previous Message Kris Jurka 2009-06-26 17:05:35 Re: gettext version problem exposed by buildfarm failures on "canary"