Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-05 18:40:58
Message-ID: 11562.1470422458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> The assertion in tsvector_delete_by_indices fails because its counting
> algorithm doesn't expect indices_to_delete to contain multiple
> references to the same index. Maybe that could be fixed by
> uniquifying in tsvector_delete_arr before calling it, but since
> tsvector_delete_by_indices already qsorts its input, it should be able
> to handle duplicates cheaply.

I poked at this and realized that that's not sufficient. If there are
duplicates in indices_to_delete, then the initial estimate

tsout->size = tsv->size - indices_count;

is wrong because indices_count is an overestimate of how many lexemes
will be removed. And because the calculation "dataout = STRPTR(tsout)"
depends on tsout->size, we can't just wait till later to get it right.

We could possibly initialize tsout->size = tsv->size (the maximum
possible value), thereby ensuring that the WordEntry array doesn't
overlap the dataout area; compute the correct tsout->size in the loop;
and then memmove the data area into place to collapse out wasted space.
But I think it might be simpler and better-performant just to de-dup the
indices_to_delete array after qsort'ing it; that would certainly win
for the case of indices_count == 1.

The other problems I noted with failure to delete items seem to stem
from the fact that tsvector_delete_arr relies on tsvector_bsearch to
find items, but the input tsvector is not sorted (never mind de'duped)
by array_to_tsvector. This seems like simple brain fade in
array_to_tsvector, as AFAICS that's a required property of tsvectors.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2016-08-05 18:41:48 Re: pg_size_pretty, SHOW, and spaces
Previous Message Fujii Masao 2016-08-05 18:25:39 Re: pg_replication_origin_xact_reset() and its argument variables