BUG #6455: Wrong match of ipsell dict.

From: vincent(dot)desmares(at)inovia-team(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6455: Wrong match of ipsell dict.
Date: 2012-02-13 13:21:16
Message-ID: E1RwvqG-0000fE-Oj@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6455
Logged by: Desmares Vincent
Email address: vincent(dot)desmares(at)inovia-team(dot)com
PostgreSQL version: 9.1.0
Operating system: Ubuntu
Description:

Hello everyone,

We recently discovered something that could be a "bug" when using the Full
Text Search of Postgres. More precisely the ispell dictionary.

It appears that words composed with the same character (like “a”, “aa”,
“aaa”, ...) trigger all the prefix and suffix rules even if nothing have
been specified in the dictionary.

We got the bug with the word “e” which was associated to the word “deer”.

Here is a short way to reproduce the bug from scratch :

# 1) Create a test.dict with only “e” inside
cat “e” > test.dict
# 2) Create an empty test.stop file
touch test.stop
# 3) Create a test.affix file with rules :
echo -e 'PFX C Y 1\nPFX C 0 de .\n\nSFX R Y 1\nSFX R 0 r e\n' > test.affix
# 4) Execute those requests :

DROP TEXT SEARCH DICTIONARY IF EXISTS testispell CASCADE;

CREATE TEXT SEARCH DICTIONARY testispell (
TEMPLATE = ispell,
DictFile = test,
AffFile = test,
StopWords = test
);

CREATE TEXT SEARCH CONFIGURATION test_ispell (
PARSER = "default"
);
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR asciihword WITH
testispell;
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR asciiword WITH
testispell;
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR uint WITH
testispell;
ALTER TEXT SEARCH CONFIGURATION test_ispell ADD MAPPING FOR word WITH
testispell;

SELECT * from ts_debug('test_ispell', 'deer');

# 5) You should get a table with this result :

alias : "asciiword"
description : "Word, all ASCII"
token : "deer"
dictionaries : "{testispell}"
dictionary : "testispell"
lexemes : "{e}"

It appear that it’s reproductible with more characters of the same letter :
- .dict with [ee] searching for [deeer] give [ee]
but
- .dict with [ee] searching for [eer|deee] give nothing

Did we miss a configuration or a default behavior, or there is really a bug
?

Regards,

Vincent Desmares
Developer @ Inovia-team

Browse pgsql-bugs by date

  From Date Subject
Next Message tmpfs 2012-02-13 16:50:20 BUG #6456: no password
Previous Message Marc Balmer 2012-02-13 08:27:59 Re: BUG #6454: Latest x64 msi does not recognize admin account