Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-03 21:53:16
Message-ID: CAEepm=0Mcmv34u4NBThj1XDFAeohgxsOzFPn4nsQFVaPMEX7MA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 4, 2016 at 9:39 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andreas Seltenreich <seltenreich(at)gmx(dot)de> writes:
>> the following statement triggers an assertion in tsearch:
>
>> select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]);
>> -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
>
> Confirmed here. I notice that the output of array_to_tsvector() is
> already fishy in this example:
>
> regression=# select array_to_tsvector('{smith,smith,smith}'::text[]);
> array_to_tsvector
> -------------------------
> 'smith' 'smith' 'smith'
> (1 row)
>
> Shouldn't those have been merged together? You certainly don't get
> results like that from other tsvector-producing operations:
>
> regression=# select to_tsvector('smith smith smith');
> to_tsvector
> ---------------
> 'smith':1,2,3
> (1 row)
> regression=# select 'smith smith smith'::tsvector;
> tsvector
> ----------
> 'smith'
> (1 row)
>
> However, that does not seem to be the proximate cause of the crash
> in ts_delete, because this non-duplicated case still crashes:
>
> select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), '{smith,smith}'::text[]);
>
> It kinda looks like you need more than one deletion request for
> the first entry in the sorted tsvector, because for example
> {smith,foo,toolbox} works but not {smith,too,toolbox}.
>
> I'm thinking there are two distinct bugs here.

The assertion in tsvector_delete_by_indices fails because its counting
algorithm doesn't expect indices_to_delete to contain multiple
references to the same index. Maybe that could be fixed by
uniquifying in tsvector_delete_arr before calling it, but since
tsvector_delete_by_indices already qsorts its input, it should be able
to handle duplicates cheaply. I was thinking something like this:

for (i = j = k = 0; i < tsv->size; i++)
{
+ bool drop_lexeme = false;
+
/*
* Here we should check whether current i is present in
* indices_to_delete or not. Since indices_to_delete
is already sorted
- * we can advance it index only when we have match.
+ * we can advance it index only when we have match. We do this
+ * repeatedly, in case indices_to_delete contains
duplicate references
+ * to the same index.
*/
- if (k < indices_count && i == indices_to_delete[k])
+ while (k < indices_count && i == indices_to_delete[k])
{
+ drop_lexeme = true;
k++;
- continue;
}
+ if (drop_lexeme)
+ continue;

But that doesn't seem to be enough, there is something else wrong here
resulting in garbage output, maybe related to the failure to merge the
tsvector...

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-08-03 21:56:56 Re: Increasing timeout of poll_query_until for TAP tests
Previous Message Tom Lane 2016-08-03 21:52:44 Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)