Re: MERGE command for inheritance

From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE command for inheritance
Date: 2010-08-11 14:09:43
Message-ID: AANLkTinzLoVCsCCt0fVm8vUS4Hpza-nv5013JkcVyX+p@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 11, 2010 at 4:45 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Tue, 2010-08-10 at 17:15 +0300, Heikki Linnakangas wrote:
> > On 10/08/10 12:38, Boxuan Zhai wrote:
> > > The difficult way is to generate the plans for children table in
> planner, as
> > > the other commands like UPDATE and DELETE. However, because the
> structure of
> > > MERGE plan is much more complex than the ordinary ModifyTable plans,
> this
> > > job may not as simple as we expected. We need to adjust both the main
> plan
> > > and the
> > > merge actions to fit the children tables, which is not straight
> forward.
> >
> > This the approach you'll have to take. But actually, I'm surprised it
> > doesn't happen to just work already. It should be opaque to the merge
> > facility that the reference to the parent target table has inherited
> > child tables - expanding the inherited table to scans of all the
> > children should already be handled by the planner.
>
> The support for UPDATE and SELECT of partitioned cases is very different
> in the planner and was handled as separate implementation projects.
>
> If we want a working MERGE in the next release, I suggest that we break
> down this project in the same way and look at partitioned target tables
> as a separate project.
>
> One reason for suggesting this is that all MERGE statements have a
> source table, whereas UPDATE and DELETEs did not always. The plan for a
> simple UPDATE and DELETE against a partitioned table is simple, but the
> plan (and performance) of a joined UPDATE or DELETE is not good:
>
> postgres=# explain update p set col2 = x.col2 from x where x.col1 =
> p.col1;
> QUERY
> PLAN
> ---------------------------------------------------------------------------
> Update (cost=299.56..1961.18 rows=68694 width=20)
> -> Merge Join (cost=299.56..653.73 rows=22898 width=20)
> Merge Cond: (public.p.col1 = x.col1)
> -> Sort (cost=149.78..155.13 rows=2140 width=10)
> Sort Key: public.p.col1
> -> Seq Scan on p (cost=0.00..31.40 rows=2140 width=10)
> -> Sort (cost=149.78..155.13 rows=2140 width=14)
> Sort Key: x.col1
> -> Seq Scan on x (cost=0.00..31.40 rows=2140 width=14)
> -> Merge Join (cost=299.56..653.73 rows=22898 width=20)
> Merge Cond: (public.p.col1 = x.col1)
> -> Sort (cost=149.78..155.13 rows=2140 width=10)
> Sort Key: public.p.col1
> -> Seq Scan on p1 p (cost=0.00..31.40 rows=2140
> width=10)
> -> Sort (cost=149.78..155.13 rows=2140 width=14)
> Sort Key: x.col1
> -> Seq Scan on x (cost=0.00..31.40 rows=2140 width=14)
> -> Merge Join (cost=299.56..653.73 rows=22898 width=20)
> Merge Cond: (public.p.col1 = x.col1)
> -> Sort (cost=149.78..155.13 rows=2140 width=10)
> Sort Key: public.p.col1
> -> Seq Scan on p2 p (cost=0.00..31.40 rows=2140
> width=10)
> -> Sort (cost=149.78..155.13 rows=2140 width=14)
> Sort Key: x.col1
> -> Seq Scan on x (cost=0.00..31.40 rows=2140 width=14)
>
> Those plans could use some love and attention before forcing Boxuan to
> implement that.
>
>

It seems that we have not decided whether to put the inheritance for MERGE
off for a latter implementation. But, I think we can discuss how to do it
now.

First of all, the inheritance of MERGE should not be implemented in the
rule-like way. I agree that the easy way I proposed is not consistent with
the general inheritance process in postgres.

The normal way of doing this is to handle it in planner, to be more
specific, we need to extend the function "inheritance_planner()" for
processing MERGE queries.

For UPDATE and DELETE commands (INSERT is not an inheritable command), if
"inheritance_planner" finds that the target table has children tables, it
will generate a list of queries. These queries are almost the same as the
original query input by user, except for the different target
relations. Each child table has it corresponding query in this list.

This list of queries will then be processed by "grouping_planner()" and
transformed into a list of plans. One most important work finished in
this function is to extend the target list of target relations to make sure
that all attributes of a target relation appears in the final result tuple
of its plan.

As for MERGE command, we need to do the same thing. But, since the main
query body is a LEFT JOIN query between source table and target table, the
top-level target list is a combination of all the attributes from source
table and target table. Thus, when we extend the target list, we should only
extent the part of target relations, and keep the source table part
untouched.

Once a main query in this style has been transformed to plan, we need to
prepare the merge actions for it too. That is, extend the target list of all
UPDATE and INSERT actions for the corresponding target relation. In this
way, each target relation will have its own "main plan + merge action" set.

The main plan will be executed one by one, so is the merge action sets, each
for one target relation.

One more thing I want to point out is that, the INSERT is also an
inheritable action in MERGE. For a plain INSERT command, all the inserted
tuples are put in the target table ONLY. It is easy to understand. We don't
want to duplicate all the new tuples in all children tables. However, in
MERGE command, an INSERT action is activated by the tuples fitting its
matching conditions. The main plan of a MERGE command will scan all the
tuples in target relation and its children tables. If one tuple in a child
table meets the requirements of INSERT actions, the insertion should be
taken on the child table itself rather than its ancestor.

PS: Since I have taken this project, I will do my best to make it perfect.
I will keep working on MERGE until it is really finished, even after the
gSoC. (unless you guys has other plans).

> --
> Simon Riggs www.2ndQuadrant.com <http://www.2ndquadrant.com/>
> PostgreSQL Development, 24x7 Support, Training and Services
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-08-11 14:12:54 Re: [HACKERS] postgres 9.0 crash when bringing up hot standby
Previous Message Kevin Grittner 2010-08-11 14:06:35 Re: Regression tests versus the buildfarm environment