Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
Date: 2017-08-21 07:42:38
Message-ID: CAN-RpxBmMvAkKu0s3jTAnsD65daSGAjOXh+0ekBxYROcoSkmNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 20, 2017 at 4:10 AM, MauMau <maumau307(at)gmail(dot)com> wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region? Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions. The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query. And they both assume that they are executed in SQL
> commands.
>

Sure but stored functions can take arguments, such as a query string which
gets handled by the language handler. There's absolutely no reason you
cannot declare a function in C that takes in a Cypher query and returns a
set of tuples. And you can do a whole lot with preloaded shared libraries
if you need to.

The planning bit is more difficult, but see below as to where I see major
limits here.

>
> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

That sounds like a foot gun. I would probably think of those cases as
being ideal for a custom background worker, similar to Mongress.
Expecting to be able to switch query languages on the fly strikes me as
adding totally needless complexity everywhere to be honest. Having
different listeners on different ports simplifies this a lot and having,
additionally, query languages for ad-hoc mixing via language handlers might
be able to get most of what you want already.

>
> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary. To take the query example in my first mail, which joins a
> relational table and the result of a graph query. The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

It seems like all you really need is a planner hook for user defined
languages (I.e. "how many rows does this function return with these
parameters" right?). Right now we allow hints but they are static. I
wonder how hard this would be using preloaded, shared libraries.

>
> So in_region() and cast_region() are not functions to be executed
> during execution phase, but are syntax constructs that are converted,
> during analysis phase, into calls to another region's parser/analyzer
> and an inter-model cast routine.
>

So basically they work like immutable functions except that you cannot
index the output?

>
> 1. The relational parser finds in_region('cypher_graph', 'graph
> query') and produces a parse node InRegion(region_name, query) in the
> parse tree.
>
> 2. The relational analyzer looks up the system catalog to checks if
> the specified region exists, then calls its parser/analyzer to produce
> the query tree for the graph query fragment. The relational analyser
> attaches the graph query tree to the InRegion node.
>
> 3. When the relational planner finds the graph query tree, it passes
> the graph query tree to the graph planner to produce the graph
> execution plan.
>
> 4. The relational planner produces a join plan node, based on the
> costs/statistics of the relational table scan and graph query. The
> graph execution plan is attached to the join plan node.
>
> The parse/query/plan nodes have a label to denote a region, so that
> appropriate region's routines can be called.
>

It would be interesting to see how much of what you want you can get with
what we currently have and what pieces are really missing.

Am I right that if you wrote a function in C to take a Cypher query plan,
and analyse it, and execute it, the only thing really missing would be
feedback to the PostgreSQL planner regarding number of rows expected?

>
> Regards
> MauMau
>
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Feike Steenbergen 2017-08-21 07:47:45 Document pgstattuple privileges without ambiguity
Previous Message Amit Langote 2017-08-21 07:32:50 Re: expanding inheritance in partition bound order