A modest proposal vis hierarchical queries: MINUS in the column list

From: Mark Zellers <mark(dot)zellers(at)workday(dot)com>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: A modest proposal vis hierarchical queries: MINUS in the column list
Date: 2021-06-07 18:25:15
Message-ID: BYAPR06MB4600BFBF43879247EFCDF3D888389@BYAPR06MB4600.namprd06.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One of the friction points I have found in migrating from Oracle to PostgreSQL is in the conversion of hierarchical queries from the Oracle START WITH/CONNECT BY/ORDER SIBLINGS by pattern to using the ANSI recursive subquery form.

Once you wrap your head around it, the ANSI form is not so bad with one major exception. In order to achieve the equivalent of Oracle’s ORDER SIBLINGS BY clause, you need to add an additional column containing an array with the accumulated path back to the root of the hierarchy for each row. The problem with that is that it leaves you with an unfortunate choice: either accept the inefficiency of returning the array with the path back to the client (which the client doesn’t need or want), or requiring the application to explicitly list all of the columns that it wants just to exclude the hierarchy column, which can be hard to maintain, especially if your application needs to support both databases. If you have a ORM model where there could be multiple queries that share the same client code to read the result set, you might have to change multiple queries when new columns are added to a table or view even though you have centralized the processing of the result set.

The ideal solution for this would be for PostgreSQL to support the Oracle syntax and internally convert it to the ANSI form. Failing that, I have a modest suggestion that I would like to start a discussion around. What if you could use the MINUS keyword in the column list of a select statement to remove a column from the result set returned to the client? What I have in mind is something like this:

To achieve the equivalent of the following Oracle query:

SELECT T.*
FROM T
START WITH T.ParentID IS NULL
CONNECT BY T.ParentID = PRIOR T.ID
ORDER SIBLINGS BY T.OrderVal

You could use

WITH RECURSIVE TT AS (
SELECT T0.*, ARRAY[]::INTEGER[] || T.OrderVal AS Sortable
FROM T T0
UNION ALL
SELECT T1.*, TT.Sortable || T1 AS Sortable
FROM TT
INNER JOIN T T1 ON (T1.ParentID = TT.ID)
)
SELECT TT.* MINUS TT.Sortable
FROM TT
ORDER BY TT.Sortable

Now the Sortable column can be used to order the result set but is not returned to the client.

Not knowing the internals of the parser, I’m assuming that the use of MINUS in this construct would be distinguishable from the set difference use case because the expression being subtracted is a column (or perhaps even a lst of columns) rather than a SELECT expression.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-06-07 18:34:29 Re: Make unlogged table resets detectable
Previous Message Stephen Frost 2021-06-07 18:20:38 Re: Duplicate history file?