Re: syntax for reaching into records, specifically ts_stat results

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Dan Chak <chak(at)MIT(dot)EDU>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: syntax for reaching into records, specifically ts_stat results
Date: 2008-12-09 21:25:42
Message-ID: Pine.LNX.4.64.0812100024260.28443@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

ok, here is a function ( credits to Teodor )

CREATE OR REPLACE FUNCTION ts_stat(tsvector, OUT word text, OUT ndoc
integer, OUT nentry integer)
RETURNS SETOF record AS
$$
SELECT ts_stat('SELECT ' || quote_literal( $1::text ) || '::tsvector');
$$ LANGUAGE SQL RETURNS NULL ON NULL INPUT IMMUTABLE;

use it like

select id, (ts_stat(fts)).* from apod where id=1;

Oleg

On Tue, 9 Dec 2008, Oleg Bartunov wrote:

> On Tue, 9 Dec 2008, Dan Chak wrote:
>
>> Oleg,
>>
>> This syntax works fine until I also want to get the "sentence_id" column in
>> there as well, so that I can differentiate one set of ts_stat results from
>> another. With the syntax where ts_stat is treated like a table, it isn't
>> possible to run ts_stat separately on multiple tsvectors as I'm doing
>> below.
>>
>> Is there some generic record access syntax that I can use?
>
> write function
>
>>
>> Thanks,
>> Dan
>>
>> On Dec 9, 2008, at 3:04 PM, Oleg Bartunov wrote:
>>
>>> try select * from ts_stat(....)
>>> btw, performance of ts_stat() was greatly improved in 8.4.
>>>
>>> Oleg
>>> On Tue, 9 Dec 2008, Dan Chak wrote:
>>>
>>>> Dear Postgres Folk,
>>>>
>>>> In working with tsvectors (added in 8.3), I've come to a place where my
>>>> syntax-fu has failed me. I've resorted to turning a result set of
>>>> records into strings so that I can regexp out the record fields I need,
>>>> rather than access them directly, as I'm sure it's possible to do with
>>>> the right syntactic formulation. Although my solution works, I'm sure
>>>> it's much less efficient than it could be, and hope someone on the list
>>>> can help do this the right way.
>>>>
>>>> Basically, I would like to transpose a series of tsvectors (one per row)
>>>> into columns. E.g., as tsvects, I have this:
>>>>
>>>> test=# select * from tsvects;
>>>> sentence_id | tsvect
>>>> -------------+------------------------------
>>>> 1 | 'fox':3 'brown':2 'quick':1
>>>> 2 | 'lazi':1 'eleph':3 'green':2
>>>>
>>>> Instead I want this:
>>>>
>>>> sentence_id | word | freq
>>>> -------------+-------+------
>>>> 1 | fox | 1
>>>> 1 | brown | 1
>>>> 1 | quick | 1
>>>> 2 | lazi | 1
>>>> 2 | eleph | 1
>>>> 2 | green | 1
>>>>
>>>> I am able to generate this with the following view, but the problem is
>>>> that to create it, I must first cast the ts_stat results to a string, and
>>>> then regexp out the pertinent pieces:
>>>>
>>>> create or replace view words as
>>>> select sentence_id,
>>>> substring(stat from '^\\(([^,]+),') as word,
>>>> substring(stat from ',([^,]+)\\)$') as freq
>>>> from (select sentence_id,
>>>> ts_stat('select tsvect from tsvects where sentence_id = ' ||
>>>> tsvects.sentence_id)::text as stat
>>>> from tsvects
>>>> ) as foo;
>>>>
>>>> It seems like there should be a way to access fields in the records
>>>> returned from ts_stat directly, but I can't figure out how. Here's the
>>>> result of the subquery:
>>>>
>>>> test=# select sentence_id,
>>>> ts_stat('select tsvect from tsvects where sentence_id = ' ||
>>>> tsvects.sentence_id)::text as stat
>>>> from tsvects;
>>>> sentence_id | stat
>>>> -------------+-------------
>>>> 1 | (fox,1,1)
>>>> 1 | (brown,1,1)
>>>> 1 | (quick,1,1)
>>>> 2 | (lazi,1,1)
>>>> 2 | (eleph,1,1)
>>>> 2 | (green,1,1)
>>>> (6 rows)
>>>>
>>>> If I try to get at the elements (which I believe are named 'word',
>>>> 'ndoc', 'nentry'), I get a variety of syntax errors:
>>>>
>>>> test=# select sentence_id,
>>>> test-# stat['word'],
>>>> test-# stat['nentry']
>>>> test-# from (select sentence_id,
>>>> test(# ts_stat('select tsvect from tsvects where
>>>> sentence_id = ' ||
>>>> test(# tsvects.sentence_id) as stat
>>>> test(# from tsvects
>>>> test(# ) as foo;
>>>> ERROR: cannot subscript type record because it is not an array
>>>>
>>>> If I say stat.word (instead of subscripting), I get 'missing FROM-clause
>>>> entry for table "stat"'. If I say foo.stat.word, I get 'ERROR: schema
>>>> "foo" does not exist'.
>>>>
>>>> Any ideas on how to get into these records with resorting to text
>>>> parsing?
>>>>
>>>> Thanks,
>>>> Dan
>>>>
>>>>
>>>
>>> Regards,
>>> Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2008-12-09 22:27:47 Re: parallel restore vs. windows
Previous Message Martin Pihlak 2008-12-09 20:58:35 Re: SQL/MED compatible connection manager