Re: WIP: URI connection string support for libpq

From: Alexander Shulgin <ash(at)commandprompt(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: URI connection string support for libpq
Date: 2011-12-14 14:52:29
Message-ID: 1323852045-sup-7596@moon
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Excerpts from Greg Smith's message of Wed Dec 14 02:54:14 +0200 2011:
>
> Initial quick review of your patch: you suggested this as the general form:
>
> psql -d postgresql://user(at)pw:host:port/dbname?param1=value1&param2=value2...
>
> That's presumably supposed to be:
>
> psql -d postgresql://user:pw(at)host:port/dbname?param1=value1&param2=value2...

Yes, that was clearly a typo, so "user:pw(at)host:port".

> If we had to pick one URI prefix, it should be "postgres". But given
> the general name dysfunction around this project, I can't see how anyone
> would complain if we squat on "postgresql" too.

That'd be true if we've started afresh in the absence of any existing URI implementations.

IMO, what makes a connection URI useful is:
a) it keeps all the connection parameters in a single string, so you can easily send it to other people to use, and
b) it works everywhere, so the people who've got the URI can use it and expect to get the same results as you do.

(Well, not without some quirks, like effects of locally-set environment variables or presence of .pgpass or service files, or different nameserver opinions about which hostname resolves to which IP address, but that is pretty much the case with any sort of URIs.)

This is not in objection to what you say, but rather an important thing to keep in mind for the purpose of this discussion.

Whatever decision we make here, the libpq-binding connectors are going to be compatible with each other automatically if they just pass the URI to libpq. However, should we stick to using "postgresql://" URI prefix exclusively, these might need to massage the URI a bit before passing further (like replacing "postgres://" with "postgresql://", also accepting the latter should be reasonable.) With proper recommendations from our side, the new client code will use the longer prefix, thus achieving compatibility with the only(?) driver not based on libpq (that is, JDBC) in the long run.

> Attached patch modifies
> yours to prove we can trivially support both, in hopes of detonating
> this argument before it rages on further. Tested like this:
>
> $ psql -d postgres://gsmith(at)localhost:5432/gsmith
>
> And that works too now. I doubt either of us like what I did to the
> handoff between conninfo_uri_parse and conninfo_uri_parse_options to
> achieve that, but this feature is still young.

Yes, the caller could just do the pointer arithmetics itself, since the exact URI prefix is known at the time, then pass it to conninfo_uri_parse.

> After this bit of tinkering with the code, it feels to me like this
> really wants a split() function to break out the two sides of a string
> across a delimiter, eating it in the process. Adding the level of
> paranoia I'd like around every bit of code I see that does that type of
> operation right now would take a while. Refactoring in terms of split
> and perhaps a few similarly higher-level string parsing operations,
> targeted for this job, might make it easier to focus on fortifying those
> library routines instead. For example, instead of the gunk I just added
> that moves past either type of protocol prefix, I'd like to just say
> "split(buf,"://",&left,&right) and then move on with processing the
> right side.

A search with cscope over my repo clone doesn't give any results for "split", so I assume you're talking about a new function with a signature similar to the following:

split(char *buf, const char *delim, char **left, char **right)

Note, there should be no need for parameter "left", since that will be pointing to the start of "buf". Also, we might just return "right" as a function's value instead of using out-parameter, with NULL meaning delimiter was not found in the buffer.

Now, if you look carefully at the patch's code, there are numerous places where it accepts either of two delimiting characters and needs to examine one before zeroing it out, so it'll need something more like this:

char *need_a_good_name_for_this(char *buf, const char *possible_delims, char *actual_delim)

where it will store a copy of encountered delimiting char in *actual_delim before modifying the buffer.

> I agree with your comment that we need to add some sort of regression
> tests for this. Given how the parsing is done right now, we'd want to
> come up with some interesting invalid strings too. Making sure this
> fails gracefully (and not in a buffer overflow way) might even use
> something like fuzz testing too. Around here we've just been building
> some Python scripting to do that sort of thing, tests that aren't
> practical to do with pg_regress.

I'd appreciate if you could point me to any specific example of such existing tests to take some inspiration from.

--
Regards,
Alex

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-12-14 15:05:26 Re: LibreOffice driver 1: Building libpq with Mozilla LDAP instead of OpenLDAP
Previous Message Peter Geoghegan 2011-12-14 14:48:26 Re: pg_dump --exclude-table-data