Re: dblink: add polymorphic functions.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: dblink: add polymorphic functions.
Date: 2015-07-06 20:42:23
Message-ID: CADkLM=c6Wip4-g5jnRxhhXtq0XCC4tNwO0oRJBMoYKzbzKMOWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 6, 2015 at 3:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Mon, Jul 6, 2015 at 11:13 AM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote:
> > On Mon, Jul 6, 2015 at 11:33 AM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:
> >>
> >> On Mon, Jul 6, 2015 at 9:52 AM, Joe Conway <mail(at)joeconway(dot)com> wrote:
> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >> > Hash: SHA1
> >> >
> >> > On 07/06/2015 07:37 AM, Merlin Moncure wrote:
> >> >> yup, and at least one case now fails where previously it ran
> >> >> through: postgres=# select * from dblink('a', 'b', 'c'); ERROR:
> >> >> function dblink(unknown, unknown, unknown) is not unique
> >> >
> >> > Hmm, that is an issue, possibly a fatal one.
> >> >
> >> > When Cory first mentioned this to me over a year ago we discussed some
> >> > other, arguably better and more generic solutions. One was to build
> >> > support for:
> >> >
> >> > SELECT * FROM srf() AS TYPE OF(foo);
> >> >
> >> > The second idea I think is actually SQL standard if I recall
> correctly:
> >> >
> >> > SELECT * FROM CAST(srf() AS foo) x;
> >> >
> >> > Currently this works:
> >> >
> >> > 8<--------------------
> >> > select *
> >> > from cast (row(11,'l','{a11,b11,c11}') as foo);
> >> > f1 | f2 | f3
> >> > - ----+----+---------------
> >> > 11 | l | {a11,b11,c11}
> >> > (1 row)
> >> > 8<--------------------
> >> >
> >> > But this fails:
> >> >
> >> > 8<--------------------
> >> > select *
> >> > from cast
> >> > (dblink('dbname=contrib_regression','select * from foo') as foo);
> >> > ERROR: cannot cast type record to foo
> >> > 8<--------------------
> >> >
> >> > Comments in the source have this to say:
> >> >
> >> > 8<--------------------
> >> > /*
> >> > * coerce_record_to_complex
> >> > * Coerce a RECORD to a specific composite type.
> >> > *
> >> > * Currently we only support this for inputs that are RowExprs or
> >> > * whole-row Vars.
> >> > */
> >> > 8<--------------------
> >> >
> >> > That explains why the first example works while the second does not.
> >> > I'm not sure how hard it would be to fix that, but it appears that
> >> > that is where we should focus.
> >>
> >> Yeah. FWIW, here's my 0.02$: I use dblink all the time, for all
> >> kinds of reasons, vastly preferring to have control over the query
> >> string (vs. FDW type solutions). I have two basic gripes with it. #1
> >> is that remote queries are not cancelled over all call sites when
> >> cancelled locally (off-topic for this thread) and #2 is that the SRF
> >> record describing mechanics are not abstractable because of using
> >> syntax to describe the record. Corey's proposal, overloading issues
> >> aside, appears to neatly deal with this problem because anyelement can
> >> be passed down through a wrapping API.
> >>
> >> IOW, I'd like to do:
> >> CREATE FUNCTION remote_call(...) RETURNS ... AS
> >> $$
> >> SELECT dblink(...) AS r(...)
> >> $$ language sql;
> >>
> >> ...which can't be done (discounting dynamic sql acrobatics) because of
> >> the syntax based expression of the 'r' record. So I like Corey's
> >> idea...I just think the functions need to be named differently (maybe
> >> to 'dblink_any', and dblink_get_result_any'?). TBH, to do better
> >> than that you'd need SQL level support for handling the return type in
> >> the vein of NEW/OLD. For fun, let's call it 'OUT'...then you could:
> >>
> >> SELECT * FROM remote_call(...) RETURNS SETOF foo;
> >>
> >> Inside remote_call, you'd see something like:
> >>
> >> SELECT dblink(...) AS OUT;
> >>
> >> As to the proposed syntax, I would vote to support the SQL standard
> >> variant if it could be handled during parse. I don't see what AS TYPE
> >> OF really buys you because FWICT it does not support wrapping.
> >>
> >> merlin
> >
> >
> > Your experiences with dblink are very similar to mine.
> >
> > The whole issue arose for me as an outcropping of my Poor Man's Parallel
> > Processing extension (still not released but currently working for us in
> > production internally).
> >
> > At some point I had to do dblink_get_result(...) as t(...) and not only
> did
> > I have to render the structure as a string, I was going to have to
> execute
> > that SQL dynamically (because plpgsql lacks a PREPARE statement) or I was
> > going to have to re-code in C or plv8. Overall those calls aren't
> terribly
> > expensive (it's working in production - for us - without this dblink
> > modification), but a cleaner solution would be better.
>
> Another option is to handle the intermediate result in json if you're
> willing to sacrifice a little bit of performance. For example,
> suppose you wanted to log every dblink call through an wrapper without
> giving up the ability to handle arbitrary result sets:
>
> CREATE OR REPLACE FUNCTION dblink_log(_query TEXT) RETURNS SETOF JSON AS
> $$
> BEGIN
> RAISE WARNING 'got dblink %', _query;
>
> RETURN QUERY SELECT * FROM dblink(
> 'host=localhost',
> format('SELECT row_to_json(q) from (%s) q', _query))
> AS R(j json);
> END;
> $$ LANGUAGE PLPGSQL;
>
> postgres=# select * from dblink_log('select 0 as value');
>
> WARNING: got dblink select 0 as value
> dblink_log
> ─────────────
> {"value":0}
> (1 row)
>
> With json, you have a number of good options to convert back to a
> record. For example,
>
> postgres=# CREATE TYPE foo AS (value int);
> CREATE TYPE
>
> SELECT p.*
> FROM dblink_log('SELECT s AS value FROM generate_series(1,3) s') j
> CROSS JOIN LATERAL json_populate_record(null::foo, j) p;
>
> WARNING: got dblink select s as value from generate_series(1,3) s
> value
> ───────
> 1
> 2
> 3
> (3 rows)
>
> Note, I still support the case behind your patch, but, if it uh,
> doesn't make it through, it's good to have options :-).
>
> merlin
>

(again, more backstory to enhance other's understanding of the issue)

In the earlier iterations of PMPP, I had it simply wrap all queries sent to
the remote server inside a 'SELECT * from row_to_json(...)'.

The serialization/deserialization was a performance hit, offset slightly by
having the RETURN QUERY SELECT json_field from dblink_get_result('c1') as
t(json_field json) be a static (reusable) query.

The ugliness of decomposing the fields from json was no fun, and Merlin's
suggestion of json_populate_record ( I don't remember if that function
existed at the time...) would get around that, albeit with another
performance hit.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniele Varrazzo 2015-07-06 21:20:26 Redundant error messages in policy.c
Previous Message Merlin Moncure 2015-07-06 19:52:54 Re: dblink: add polymorphic functions.