Re: tsearch_core patch: permissions and security issues

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: tsearch_core patch: permissions and security issues
Date: 2007-06-15 04:46:35
Message-ID: Pine.LNX.4.64.0706150823290.1881@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-advocacy pgsql-hackers

On Thu, 14 Jun 2007, Tom Lane wrote:

> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
>> You're correct. But we can't defend users from all possible errors.
>> Other side, that we need somehow to help user to identify what fts
>> configuration was used to produce tsvector. For example, comment on
>> tsvector column would be useful, but we don't know how to do this
>> automatically.
>
> Yeah, I was wondering about that too. The only way we could relax the
> superuser, you-better-know-what-you're-doing restriction on changing
> configurations would be if we had a way to identify which tsvector
> columns needed to be updated. Right now that's pretty hard to find out
> because the references to configurations are buried in the bodies of
> trigger functions. That whole trigger-function business is not the
> nicest part of tsearch2, either ... it'd be better if we could automate
> tsvector maintenance more.

yes, trigger function is a complex stuff, our tsearch() trigger is an
example of automated stuff. It could be written very easy on plpgsql,
for example.

=# create function my_update() returns trigger as
$$
BEGIN
NEW.fts=
setweight( to_tsvector('english',NEW.t1),'A') || ' ' ||
setweight( to_tsvector('english',NEW.t2),'B');
RETURN NEW;
END;
$$
language plpgsql;

>
> One thing I was thinking about is that rather than storing a physical
> tsvector column, people might index a "virtual" column using functional
> indexes:
>
> create index ... (to_tsvector('english', big_text_col))
>
> which could be queried
>
> select ... where to_tsvector('english', big_text_col) @@ tsquery

this is already possible for gin index

create index gin_text_idx on test using gin (
( coalesce(to_tsvector(title),'') || coalesce(to_tsvector(body),'') )
);

apod=# select title from test where
(coalesce(to_tsvector(title),'') || coalesce(to_tsvector(body),'') ) @@
to_tsquery('supernovae') order by sdate desc limit 10;

>
> Assuming that the index is lossy, the index condition would have to be
> rechecked, so to_tsvector() would have to be recomputed, but only at the
> rows identified as candidate matches by the index. The I/O savings from
> eliminating the heap's tsvector column might counterbalance the extra
> CPU for recomputing tsvectors. Or not, but in any case this is
> attractive because it doesn't need any handmade maintenance support like
> a trigger --- the regular index maintenance code does it all.

I'm afraid it wouldn't work for all cases. We already have headline() function
which had to reparse document to produce text snippet and it's very slow
and eats most select time.
ALso, trigger stuff is a normal machinery for databases.

>
> It strikes me that we could play the same kind of game we played to make
> nextval() references to sequences be recognized as dependencies on
> sequences. Invent a "regconfig" OID type that's just like regclass
> except it handles OIDs of ts_config entries instead of pg_class entries,
> and make the first argument of to_tsvector be one of those:
>
> create index ... (to_tsvector('english'::regconfig, big_text_col))
>
> Now dependency.c can be taught to recognize the regconfig Const as
> depending on the referenced ts_config entry, and voila we have a
> pg_depend entry showing that the index depends on the configuration.
> What we actually do about it is another question, but this at least
> gets the knowledge into the system.
>

interesting. And \di could display all configuration stuff for text search
indexes ?

> [ thinks some more... ] If we revived the GENERATED AS patch,
> you could imagine computing tsvector columns via "GENERATED AS
> to_tsvector('english'::regconfig, big_text_col)" instead of a
> trigger, and then again you've got the dependency exposed where
> the system can see it. I don't wanna try to do that for 8.3,
> but it might be a good path to pursue in future, instead of assuming
> that triggers will be the way forevermore.
>
> Thoughts?

No way with standard. GENERATED AS says that "all columns references in an
expression associated with a generated column must be to columns of the base
table containing that generated column."

tsvector could be result of rather complex select involving several tables.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-advocacy by date

  From Date Subject
Next Message Oleg Bartunov 2007-06-15 05:03:33 Re: The Business Case for PostgreSQL
Previous Message Liam O'Duibhir 2007-06-15 04:23:08 The Business Case for PostgreSQL

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-06-15 04:53:41 Re: Sorted writes in checkpoint
Previous Message Oleg Bartunov 2007-06-15 04:00:10 Re: How does the tsearch configuration get selected?