Re: BUG #14245: Segfault on weird to_tsquery

From: David Kellum <david(at)gravitext(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14245: Segfault on weird to_tsquery
Date: 2016-07-12 20:54:53
Message-ID: 1468356893.2574.7@smtp.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Jul 12, 2016 at 12:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> david(at)gravitext(dot)com writes:
>> I am doing some (fuzz) testing of full text queries and managed to
>> generate the following case which causes a SEGFAULT on PostgreSQL
>> 9.6
>> beta1 and beta2:
>> select to_tsquery('!(a & !b) & c') as tsquery
>> This weird query outputs the following on 9.5.2, instead of
>> crashing:
>> "!( !'b' ) & 'c'"
>
> Note that while crashing is certainly not good, the pre-9.6 behavior
> can hardly be called correct either. What happened to 'a'?

'a' is a stopword, dropped by to_tsquery() as described here:

https://www.postgresql.org/docs/9.6/static/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
> The difference is that while basic tsquery input takes the tokens at
> face value, to_tsquery normalizes each token into a lexeme using the
> specified or default configuration, and discards any tokens that are
> stop words according to the configuration.

...and I believe I want this behavior. Otherwise queries with stopword
in '&' condition will not match anything. In truth I have no reason to
want to support this kind of weird double negative, on any version, and
will also look at filtering it out in my code before calling
to_tsquery().

It might be worth noting that these other slightly different cases are
fine on 9.6:

select to_tsquery('!(apple & !b) & c'); ---> !( 'appl' & !'b' ) & 'c'
select to_tsquery('!(apple & !a) & c'); ---> !'appl' & 'c'\

Clearly a pretty obscure case, but a crash nonetheless.

> Also, it looks like this is specific to to_tsquery; if you just feed
> the same thing to tsqueryin, it seems fine with it:
>
> # select '!(a & !b) & c'::tsquery;
> tsquery
> -----------------------
> !( 'a' & !'b' ) & 'c'
> (1 row)

Against another test table, English search config, I confirmed that 'a
& ball'::tsquery doesn't match anything, but to_tsquery('a & ball')
does.

Thanks,
David

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-07-12 21:11:32 Re: BUG #14245: Segfault on weird to_tsquery
Previous Message Tom Lane 2016-07-12 19:42:25 Re: BUG #14245: Segfault on weird to_tsquery

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-07-12 20:59:23 Re: pgbench - minor fix for meta command only scripts
Previous Message Fabrízio de Royes Mello 2016-07-12 20:42:21 Re: [COMMITTERS] Logical decoding