Quick Links

Re: Useless removal of duplicate GIN index entries in pg_trgm

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Useless removal of duplicate GIN index entries in pg_trgm
Date:	2012-08-27 19:38:11
Message-ID:	7688.1346096291@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> After pg_trgm extracts the trigrams as GIN index keys, generate_trgm()
> removes duplicate index keys, to avoid generating redundant index entries.
> Also ginExtractEntries() which is the caller of pg_trgm does the same thing.
> Why do we need to remove GIN index entries twice? I think that we can
> get rid of the removal-of-duplicate code block from generate_trgm()
> because it's useless. Comments?

I see eight different callers of generate_trgm(). It might be that
gin_extract_value_trgm() doesn't really need this behavior, but that
doesn't mean the other seven don't want it.

Also, seeing that generate_trgm() is able to use relatively cheap
trigram-specific comparison operators for this, it's not impossible
that getting rid of duplicates internal to it is a net savings even
for the gin_extract_value case, because it'd reduce the number of
much-more-heavyweight comparisons done by ginExtractEntries...

regards, tom lane

In response to

Useless removal of duplicate GIN index entries in pg_trgm at 2012-08-27 16:46:18 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2012-08-27 19:38:20	Re: wal_buffers
Previous Message	Dean Rasheed	2012-08-27 19:35:00	Re: Optimize referential integrity checks (todo item)