On 29 October 2010 09:59, Brendan Jurd <direvus(at)gmail(dot)com> wrote:
> On 18 October 2010 01:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Brendan Jurd <direvus(at)gmail(dot)com> writes:
>>> On 17 October 2010 09:59, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Good point. Maybe the correct fix is to remember whether each token was
>>>> quoted or not, so that keyword detection can be done safely after the
>>>> initial lexing. I still think that the current method is impossibly
>>>> ugly ...
>>> I'm happy to revise the patch on that basis. Any suggestions about
>>> how to communicate the 'quotedness' of each token? We could make each
>>> token a struct consisting of the token itself, plus a boolean flag to
>>> indicate whether it had been quoted. Does that work for you?
>> Seems reasonable. I had the idea of a parallel list of booleans in the
>> back of my mind, but a list of structs is probably easier to understand,
>> and to extend further if necessary.
> Okay, I've taken the red pill and I'm finding out how deep the rabbit
> hole goes ...
> The logical structure of pg_hba.conf is a set of lines, each line
> containing a set of fields, each field containing a set of tokens.
> The way the existing implementation handles this is to create a list
> of lines containing sublists of fields, containing comma-separated
> strings for the set of tokens, with newlines embedded next to tokens
> which might be keywords.
> The tokeniser breaks apart the comma-separated tokens ... and then
> reassembles them into a comma-separated string. Which the db/role
> matching functions then have to break apart *again*.
> In order to keep track of whether each individual token was quoted, I
> first need to impose some sanity here. Rather than using a magical
> string for each field, I intend to use a List of HbaToken structs
> which explicitly note whether quoting was used.
> Introducing an extra List level does mean a bit more work copying and
> freeing, and it makes the patch really quite intrusive. I have to
> touch a lot of lines in hba.c, but I think the additional clarity is
> worth it. If nobody dissuades me from this approach I hope to post a
> patch in a couple of days.
I've finally managed to get a patch together for the above. It ended
up being a much more difficult project than I anticipated.
Just to recap, the HEAD code in hba.c parses pg_hba.conf (and
ident.conf) into a List of lines, each line being a List of strings.
These strings have various magical properties, including commas to
separate individual tokens and newlines to mark up keywords.
I have replaced this behaviour with something which more closely
represents the true logical structure of the file, and is (hopefully)
much more intelligible. The patch introduces the HbaToken struct,
which consists of a string and a boolean to indicate whether the
string was parsed inside quotes. The parser now produces a
triple-nested list -- a List of lines, each line being a List of
fields, each field being a List of tokens.
It sounds straightforward enough, but to make this happen I had to
change a very substantial fraction of pg_hba.c.
I have tested a few basic pg_hba.conf setups and everything seems to
be working as intended. There is much more testing to be done, it
probably leaks memory and there is a lot more room for cleanup in
hba.c, but I would like to open the patch up for review and comment as
Posting it to the 2011-Next CF.
In response to
pgsql-hackers by date
|Next:||From: Adrian von Bidder||Date: 2011-04-01 07:12:54|
|Subject: Re: Should psql support URI syntax?|
|Previous:||From: Noah Misch||Date: 2011-04-01 04:56:29|
|Subject: Re: BUG #5856: pg_attribute.attinhcount is not correct.|