Quick Links

Re: tsearch patch and namespace pollution

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject:	Re: tsearch patch and namespace pollution
Date:	2007-08-17 02:18:50
Message-ID:	200708170218.l7H2Io715461@momjian.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane wrote:
> I find the following additions to pg_proc in the current tsearch2 patch:

It seems a lot of these are useless and just bloat. I will mark a few:

Why would anyone look these up via OID rather than name?

Do we realy need this many ranking functions?

> token_type(oid) | record

Again, why by OID?

Why OID again for the configuration? I just don't see the use case and
it is bloat and causes confusion.

> to_tsquery(text,text) | tsquery
> plainto_tsquery(oid,text) | tsquery
> plainto_tsquery(text,text) | tsquery

Again, OID. I asked Oleg about this and he said:

> Bruce, just remove oid argument specification from documentation.

so I think we can go ahead and remove cases where the configuration name
or object is specified by oid. I have already removed them from the
documentation and I though the patch had them removed too, but I guess
not. Admittedly this API has been in flux.

> to_tsvector(text) | tsvector
> to_tsquery(text) | tsquery
> plainto_tsquery(text) | tsquery
> tsvector_update_trigger() | trigger
> get_ts_config_oid(text) | oid
> get_current_ts_config() | oid
> (82 rows)
>
> (This list omits functions with INTERNAL arguments, as those are of
> no particular concern to users.)

Also @@ accepts TEXT @@ TEXT, at least according to the documentation.
Is it clear to anyone which is tsquery and which tsvector? Is this
something we want to support?

> While most of these are probably OK, I'm disturbed by the prospect
> that we are commandeering names as generic as "parse" or "stat"
> with argument types as generic as "text". I think we need to put
> a "ts_" prefix on some of these. Specifically, I find these names
> totally unacceptable without a ts_ prefix:
>
> stat(text) | record
> stat(text,text) | record
>
> token_type(oid) | record
> token_type(text) | record
>
> parse(oid,text) | record
> parse(text,text) | record
>
> lexize(oid,text) | text[]
> lexize(text,text) | text[]
>
> These guys might be all right given that some of their arguments are
> tsvector or tsquery, but it's not completely convincing --- think about
> the case where an argument is given as an undecorated literal string.
> It's also not all that clear that they are related to text searching.
> I'm for putting a ts_ prefix on them too:
>
> rank(real[],tsvector,tsquery,integer) | real
> rank(real[],tsvector,tsquery) | real
> rank(tsvector,tsquery,integer) | real
> rank(tsvector,tsquery) | real
> rank_cd(real[],tsvector,tsquery,integer) | real
> rank_cd(real[],tsvector,tsquery) | real
> rank_cd(tsvector,tsquery,integer) | real
> rank_cd(tsvector,tsquery) | real
>
> rewrite(tsquery,tsquery,tsquery) | tsquery
> rewrite(tsquery,text) | tsquery
> rewrite_accum(tsquery,tsquery[]) | tsquery
> rewrite_finish(tsquery) | tsquery
> rewrite(tsquery[]) | tsquery
>
> headline(oid,text,tsquery,text) | text
> headline(oid,text,tsquery) | text
> headline(text,text,tsquery,text) | text
> headline(text,text,tsquery) | text
> headline(text,tsquery,text) | text
> headline(text,tsquery) | text
>
> These guys are just plain badly named, as it's completely unobvious that
> they have anything to do with tsearch (or what they do at all, actually).
> Furthermore the "varchar" variants seem entirely redundant with the
> "text" ones:
>
> vq_exec(tsvector,tsquery) | boolean
> qv_exec(tsquery,tsvector) | boolean
> tt_exec(text,text) | boolean
> ct_exec(character varying,text) | boolean
> tq_exec(text,tsquery) | boolean
> cq_exec(character varying,tsquery) | boolean
>
> Comments, suggestions?

I would be happy if all text search functions began with 'ts', 'ts_', or
'to_ts', and if we don't clean this up now, it is going to be harder in
the future. I think users can expect some migration for text search in
8.3 as a benefit of getting into core and be dump-able.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

tsearch patch and namespace pollution at 2007-08-17 01:09:26 from Tom Lane

Responses

Re: tsearch patch and namespace pollution at 2007-08-17 11:59:36 from Michael Paesold

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2007-08-17 02:21:44	Re: GIT patch
Previous Message	Joshua D. Drake	2007-08-17 02:17:24	Re: GIT patch