Re: Query plan and Inheritance. Weird behavior

From: John Lange <lists(at)darkcore(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query plan and Inheritance. Weird behavior
Date: 2003-01-23 15:36:11
Message-ID: 1043336171.2048.35.camel@johnlaptop.darkcore.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 2003-01-23 at 00:07, Tom Lane wrote:
> > On 22 Jan 2003, John Lange wrote:
> >> In this way the parent table would not need to know, nor would it care
> >> about child tables in any way (just like inheritance in most programming
> >> languages). If done this way a select on a parent table would only
> >> require the retrieval of a single row and a select on a child table
> >> would only require the retrieval of two rows (one in the child table and
> >> one in the parent table).
>
> No, it'd require the retrieval of N rows: you're failing to think about
> multiple levels of inheritance or multi-parent inheritance, both of
> which are supported reasonably effectively by the current model.

Lets not be too nit-picky here. In the case of multiple layers of
inheritance you are still only selecting two rows (at a time), one
child, one parent. However if the parent also has a parent, then the
process repeats, once for every layer.

This is entirely reasonable and efficient compared to the current model
where a select on a parent table requires the same select to be executed
on EVERY child table. If it's a large expensive JOIN of some kind then
this is verging on un-workable.

> My guess is that this scheme would crash and burn just on locking
> considerations. (When you want to update a child row, what locks do you
> have to get in what order? With pieces of the row scattered through
> many tables, it'd be pretty messy.)

You lock the parent on down to the last child. I'm not a database
developer but that seems fairly straight forward?

The choice between the schema I've suggested and the way it is currently
implemented is a trade off between more efficient selects vs. more
efficient updates. If you are selecting on the parent table more than
updating then my idea is vastly more efficient. If you INSERT a lot then
the current way is marginally better.

With apologies to the developers, I don't feel the current
implementation is really usable for the simple fact that expensive
operations performed on the parent table causes them to be repeated for
every child table. And, as an added penalty, indexes on parent tables
are NOT inherited to the children so the child operations can be even
more expensive.

This solution is not that large and I've already got 6 child tables. It
just so happens that I do a LOT of selects on the parent so I'm going to
have to make a decision on where to go from here.

Solving this programmatically is not really that hard but I've gone a
ways down this path now so I'm not anxious to redo the entire database
schema since we do have customers already using this.

> You may care to look in the pghackers archives for prior discussions.

I will, thanks.

> The variant scheme that's sounded most interesting to me so far is to
> store *all* rows of an inheritance hierarchy in a single physical table.

Unless I'm not understanding I don't think that works. In my case for
example, a single parent has 4-5 children so the only columns they have
in common are the ones in the parent. Combining them all into a single
table means a big de-normalized table (loads of empty columns). If you
are going to go this route then you might as well just do it. It doesn't
need to be implemented on the DBMS.

Regards,

John Lange

> This'd require giving up multiple inheritance, but few people seem to
> use that, and the other benefits (like being able to enforce uniqueness
> constraints over the whole hierarchy with just a standard unique index)
> seem worth it. No one's stepped up to bat to do the legwork on the idea
> yet, though. One bit that looks pretty tricky is ALTER TABLE ADD
> COLUMN.
>
> regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Curt Sampson 2003-01-23 15:50:27 Re: [PERFORM] Terrible performance on wide selects
Previous Message Tom Lane 2003-01-23 15:26:19 Re: [PERFORM] optimizing query