Re: Preliminary patch for tsearch example dictionaries/parsers in contrib

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: "Sergey V(dot) Karpov" <karpov(at)sao(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgreSQL(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Preliminary patch for tsearch example dictionaries/parsers in contrib
Date: 2007-10-09 11:18:41
Message-ID: 20071009111841.GG3093@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Tue, Oct 09, 2007 at 02:02:03PM +0400, Sergey V. Karpov wrote:
>
> Hi Tom,
>
> Thank you for starting the discussion
>
> > Given all the flap about txid, this surely mustn't go in without public
> > review first ;-). So, here is a submission from Sergey Karpov to fill
> > in the lack of any working code examples for user-written tsearch
> > parsers and dictionaries.
> >
> > I will be mostly off-line for the next day or so and don't have time to
> > work on this more now, but here are a few comments:
> >
> > * It seems a bit odd to put multiple independent contrib modules under a
> > single subfolder. I'd be inclined to drop the ts_pack layer and just
> > make the dictionaries and parser be top-level contrib modules.
>
> Yes, I understand your position, as well as Magnus' complaints. However,
> putting all the code to its own contribs is not the best solution, as
> the majority of it is no more than examples. dict_regex, on the
> contrary, is an add-on very useful in some situations (and we actually
> use in in our projects). Also, its requirements differ from the rest of
> the dictionaries, see below.
>
> So, what about the following layout:
>
> - contrib/ts_examples - single module which contains all the example
> stuff in a single folder, to be built together
>
> - contrib/dict_regex - separate contrib

I haven't actually looked at the code. The problem from the MSVC
perspective is that it generates one output file per directory. So we'll
need to special-case it somehow. But that can certainly be done.

But since we're on the topic (not here, but in the other thread really) of
what should be in /contrib and what shouldn't - shuld contrib/ts_examples
realy be in contrib or could they live on pgfoundry? With just dict_regex
in contrib?

> > * Depending on PCRE, when we have an at-least-equally-good regex engine
> > built in, is silly. It's an unnecessary dependency and to the (minor)
> > extent that the regex syntax is different, we'd have to document the
> > discrepancies.
>
> Built-in regex engine seems to not support the one feature critical to
> the dict_regex operation - it is not able to report the "partial match"
> in a case when the matching fails solely due to premature end of input
> string (i.e. when matching may possibly succeed after adding some data
> to the string).
>
> If it is possible to achieve this behaviour with built-in engine, please
> point me to the right direction.

Can't comment on that since I don't know it, but perhaps it's something we
can pull in from upstream and add to our regex engine? (But that would
probably make this 8.4 material for sure, no?)

> > * dict_regex is not nearly up to speed on encoding or locale issues.
> > I didn't look at the other ones too closely, they may or may not need
> > similar adjustments.
> >
> > * Allowing config files to be read from anywhere is not acceptable.
> > We have dealt with this in the core code and the contrib examples
> > *must* follow the same rules.
>
> Is it necessary to require this behaviour from each contrib module? They
> are not core code, and usually solve application-level tasks - is it
> optimal to store the application config files in postgres tree?

Yes. We've been thruogh that many times wrt adminpack, and we don't want to
do that again :-)
See convert_and_check_filename() in adminpack.c.

> Also, these dictionaries need some example config files at the
> regression test time, and these configs are of no sense for anyone - is
> it good to pullute the system tree with them?
>
> On the other hand, to prevent reading arbitrary files we may require the
> specific header line which identifies these dictionary configs.

Depending on how you deal with it, that will let the user know if a file
exists (you'd have to give the "file not found" error or similar even if
the file is there but doesn't have the header, which is likely to confuse
the user). It can be worked around, but isn't necessarily easy.

//Magnus

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Martin Pitt 2007-10-09 13:56:33 Provide a way to not ask for a password in psql
Previous Message Csaba Nagy 2007-10-09 11:04:52 Re: Including Snapshot Info with Indexes