Finding intercept of two documents (two tsvector fields)

From: Allan Kamau <kamauallan(at)gmail(dot)com>
To: Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Finding intercept of two documents (two tsvector fields)
Date: 2010-08-31 14:58:47
Message-ID: AANLkTinNgadhJByS6ivOB-xmSaqWL=Mc846MRpMd9xMt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I am looking for a way to obtain the words that are common amongst two
tsvector records.

The long workaround I know is to:
1)convert the contents of the tsvector fields to text then find and
replace single quote followed by space then single quote with a comma
character then stripping away the first and last single quote
characters.
2)convert this CSV data into array using string to array.
3)Perform a join on these two arrays and unnest them and join on the
unnested values.
4)Perform array_agg on any one field (of unnested array values) of
these two tables with aid of a GROUP BY.
5)cast the array to string using a white space as the delimiter.
6)cast the string to tsvector.

Allan.

Browse pgsql-general by date

  From Date Subject
Next Message Joshua D. Drake 2010-08-31 15:43:09 Re: pg_dump --compress error
Previous Message Henk van Lingen 2010-08-31 14:50:09 Forcing the right queryplan