Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Edwin Groothuis <postgresql(at)mavetju(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Date: 2008-03-07 06:52:24
Message-ID: 7543.1204872744@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-patches

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> The problem with this approach is how to select the part of the document
> to index. How will you ensure you're not ignoring the more important
> words of the document?

That's *always* a risk, anytime you do any sort of processing or
normalization on the text. The question here is not whether or not
we will make tradeoffs, only which ones to make.

> IMHO Postgres shouldn't decide it; it would be good if an user could set
> it runtime and/or on postgresql.conf.

Well, there is exactly zero chance of that happening in 8.3.x, because
the bit allocations for on-disk tsvector representation are already
determined. It's fairly hard to see a way of doing it in future
releases that would have acceptable costs, either.

But more to the point: no matter what the document length limit is,
why should it be a hard error to exceed it? The downside of not
indexing words beyond the length limit is that searches won't find
documents in which the search terms occur only very far into the
document. The downside of throwing an error is that we can't store such
documents at all, which surely guarantees that searches won't find
them. How can you possibly argue that that option is better?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jan Strube 2008-03-07 09:38:43 BUG #4019: Comparison of user defined domain doesn't work
Previous Message Euler Taveira de Oliveira 2008-03-07 06:14:09 Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2008-03-07 12:18:54 Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Previous Message Euler Taveira de Oliveira 2008-03-07 06:14:09 Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit