Re: Limitation on number of positions (tsearch)

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Limitation on number of positions (tsearch)
Date: 2007-09-13 11:59:30
Message-ID: 46E92622.5030601@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Why is there a limitation of 256 positions per lexeme in a tsvector?
> There doesn't seem to be a technical reason for that. WordEntryPosVector
> uses a uint16 to store the number of positions, so it go up to 65535.

For two reasons:
- Ranking might become very slow if number of position is big
- From practice: if word is very frequent on document then with high probability
this is a stop word or (case of internet-wide search engines) document is a spam.

That's common practice of search engines to limit number of word's positions,
because increasing it doesn't give advantage in term of ranking
and cause trouble from increasing of storage size.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-09-13 12:23:29 Re: Preparation for PostgreSQL releases 8.2.5, 8.1.10, 8.0.14, 7.4.18, 7.3.20
Previous Message Heikki Linnakangas 2007-09-13 11:09:05 Limitation on number of positions (tsearch)