Incorrect FTS result with GIN index

From: Artur Dabrowski <ad(at)astec(dot)com(dot)pl>
To: pgsql-general(at)postgresql(dot)org
Subject: Incorrect FTS result with GIN index
Date: 2010-07-15 13:09:30
Message-ID: 29172750.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Hello,

I was trying to use GIN index, but the results seem be incorrect.

1. QUERY WITHOUT INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

count
-------
123
(1 row)

2. CREATING INDEX
create index idx_keywords_ger on search_tab
using gin(to_tsvector('german', keywords));

3. QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'dd:*'));

count
-------
116
(1 row)

The number of rows is different. To make things more funny and ensure
problem is not caused by dictionary normalisation:

4. EQUIVALENT QUERY WITH INDEX
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* & dd:*'));

count
-------
123
(1 row)

I tried the same with simple-based dictionary. The problem is always
reproducible.

Total count of records in my database is 1 006 300 if it matters.

One of missing results is the following: "lSWN eeIf hInEI IN
SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce". If the query is more
specifically targeted to find this row then it founds it:

5. MORE DETAILED QUERY WITH INDEX
select keywords from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'eeI:* & dd:*'));

keywords

--------------------------------------------------------------------------------

lSWN eeIf hInEI IN SIL3WugEOANcEGVWL1L LBAGAeLlGS ttfL DDhuDEIni9 ce
tSALWIEEIn-3WNecGAINfLuLAV DDLIWNG E Lt h c8 BiIfgGl1 EeIhulSLenS6LDe5O
hGn DDlhIgGEAcS1O eeiEEI WnILWELS68VBLL AGNIAfINt6 lLuWuNeDc ItLfe SL
hGe WIiI EeItnLLuA1efOh3ALWc uGINEltcIBE LnegLDNA3 DD SVNG LSSIlWfE
eeIW ItueS W39LnELg-GuDLEhAn8BeFG IVi DDNEfLG1SI 1tNIOA lAhNLLccfWISE l
6em on.0nsRH nehSA2l1HAsauncu0I65l7 ddnsn1SAS i u0eLAnlr t70gaains w gzsH
eeiog
rfiwgso0g364l1 1wU eei1n 5lL dDA 0
DDInNcEfSWAEAtcL1IeSuAG5LE Lilh8tEGeDg f3B eEIOL7h uWV-L1IGN LINWeIn l S
ils eeiru00ewH.6sgAeHoSlLhglso0 asn0u2a atisA0 ddcngAnzRA Se Au2 nm8ns0
uS8snH
DDD EWlE1GShhLe8L NENI tuL cgGGInfcBAlLfIO L1S eeIWeAEnILStu AViWNI
n IOLLt 0Alih tuWNE L nAGlVSNSDI DDeW BIegfG EeIhL9ELeScELWGAIfN1uIc
DnSE eeIWLu9tLNhNEuAt I1BelhGGfLWLS nSWINI eiELgAIG DDLEclV7 IO c Af
EeIElfN L4I lE2G cSOLniAWgSVItc ILDN L57BuDfALtSIe-WnGhGIW DDA NE1Lhuee
hNILN DD L6flSEeW1gthfI L1WAlENE eEIGIAt VGBDO uGLeLccAeSuLWIn Ii nS
(14 rows)

Did I misunderstood something or is it a bug?

Best regards
Artur
--
View this message in context: http://old.nabble.com/Incorrect-FTS-result-with-GIN-index-tp29172750p29172750.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Artur Dabrowski 2010-07-15 13:29:51 Re: Incorrect FTS results with GIN index
Previous Message Terry Lee Tucker 2010-07-15 12:53:57 Locking Down a Database

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-15 13:20:29 Re: standard_conforming_strings
Previous Message Simon Riggs 2010-07-15 12:18:57 Re: Partitioning syntax