Re: Inheritance

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Inheritance
Date: 2016-05-25 02:47:04
Message-ID: CAMsr+YFGkM1dxDmqunBxufgRsWUyyxkyg=+mzF1988u_2HO=-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24 May 2016 at 22:45, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
wrote:

> There is one aspect of inheritance support which was not mentioned:
> polymorphic queries.
> Actually polymorphism is the fundamental feature of OOP, without it there
> is no behavioral inheritance and inheritance can be considered just as
> "syntax sugar" for sharing some common subset of attributes between tables.
>
> The main problem with supporting polymorphic queries is that SQL query
> returns set of tuples, not set of objects.
> So there is no nice way to return both instances of based and derived
> tables. There are several alternatives
> (for example return joined set of attributes in all derived tables,
> leaving missed as NULLs) but none of them is good.
>

Exactly. We have a sort-of-object-ish storage option, but none of the
surrounding stuff to make it useful for actual OO / object-relational work.

The "joined set of attributes" approach is exactly what ORMs already do,
and many direct implementations of the same idea will use too. So we'd
offer no advantage over what they already do in a way that works with
multiple DBMSes, except we might be able to do it faster. Maybe.

The lack of polymorphism is critical. It's not really usefully OO but it
costs you important relational features if you use it. We have some very
limited polymorphism in the sense that you can query the parent table and
see rows in child tables, but you only get the subset of cols that exists
at that level of the heirarchy.

One thing I'd like to explore one day is a nice, user-friendly way to
express "SELECT this row and the corresponding sets of rows from [these
tables and their children in turn] as a structured object". Right now users
have to write series of LEFT JOINs and de-duplicate the left-hand sides. Or
do multiple queries (n+1 selects), possibly expensively with repeated join
work involved. Or they have to write pretty baroque queries to construct a
jsonb object with jsonb_agg with multiple levels of group-by in
subqueries-in-from. We should be able to do this for them, so they can say

SELECTOBJECT customer
CHILD JOIN invoice ON (customer.customer_id = invoice.customer_id AND
invoice_date > $1)
CHILD JOIN invoiceline USING (invoice_id)
CHILD JOIN address USING (customer_id)
WHERE customer.in_debt_collections;

instead of the current nested mess of aggregation and subqueries needed,
like:

SELECT
to_jsonb(customer) || (
SELECT jsonb_build_object('invoices', jsonb_agg(invoice_obj))
FROM (
SELECT to_jsonb(invoice) || jsonb_build_object('invoice_lines',
jsonb_agg(invoice_line))
FROM invoice
LEFT OUTER JOIN invoice_line ON (invoice.invoice_id =
invoice_line.invoice_id)
WHERE invoice.customer_id = customer.customer_id AND invoice_date
>= current_date
GROUP BY invoice.invoice_id
) invoice_obj
) || (
SELECT jsonb_build_object('addresses', jsonb_agg(address))
FROM address
WHERE address.customer_id = customer.customer_id
)
FROM customer
WHERE customer.in_debt_collections

which is just pure joy to read, and gets even more contorted as the layers
of parent/child relationships get deeper. The resulting query plan clearly
expresses the desired result, but writing it in SQL is horrible:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on customer (cost=0.00..130226.27 rows=1270 width=64)
SubPlan 1
-> Aggregate (cost=77.47..77.48 rows=1 width=24)
-> Subquery Scan on invoice_obj (cost=77.37..77.46 rows=3
width=24)
-> HashAggregate (cost=77.37..77.43 rows=3 width=104)
Group Key: invoice.invoice_id
-> Hash Right Join (cost=50.84..77.36 rows=3
width=104)
Hash Cond: (invoice_line.invoice_id =
invoice.invoice_id)
-> Seq Scan on invoice_line
(cost=0.00..22.00 rows=1200 width=68)
-> Hash (cost=50.80..50.80 rows=3 width=40)
-> Seq Scan on invoice
(cost=0.00..50.80 rows=3 width=40)
Filter: ((customer_id =
customer.customer_id) AND (invoice_date >= ('now'::cstring)::date))
SubPlan 2
-> Aggregate (cost=25.02..25.03 rows=1 width=32)
-> Seq Scan on address (cost=0.00..25.00 rows=6 width=32)
Filter: (customer_id = customer.customer_id)
(16 rows)

Maybe grouping sets can help avoid the nested joins, but I couldn't figure
out how without wrapping the grouping set output query in another query to
aggregate the produced objects into a top level one.

Inheritance does nothing to help with this.

> But I do not think that presence of ORM excludes necessity to have
> internal support of OO in DBMS.
>
>
I'd agree if it was a finished and general OO feature. I just don't think
what we have now is.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2016-05-25 02:52:38 Does people favor to have matrix data type?
Previous Message Craig Ringer 2016-05-25 01:52:11 Re: Inheritance