Re: tsearch refactorings

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: tsearch refactorings
Date: 2007-09-05 17:13:45
Message-ID: 46DEE3C9.8060805@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Heikki Linnakangas wrote:
> Teodor Sigaev wrote:
>>> Ok. Probably easiest to do that by changing the palloc to palloc0 in
>>> parse_tsquery.
>> and change sizeof to sizeof(QueryItem)
>
> Do you mean the sizeofs in the memcpys in parse_tsquery? You can't

Oops, I meant pallocs in push* function. palloc0 in parse_tsquery is another way.

>
> BTW, can you explain what the CRC-32 of a value is used for? It looks
> like it's used to speed up some operations, by comparing the CRCs before
> comparing the values, but I didn't quite figure out how it works. How
It's mostly used in GiST indexes - recalculating crc32 every time for each index
tuple to be checked is rather expensive.

> much of a performance difference does it make? Would hash_any do a
> better/cheaper job?
crc32 was chosen after testing a lot of hash function. Perl's hash was the
fastest, but crc32 makes much less number of collisions. That's interesting for
ASCII a lot of functions produce rather small number of collision, but for upper
part of table (0x7f-0xff) crc32 was the best. CRC32 has evenly distributed
collisions over characters, others - not.

> In any case, I think we need to calculate the CRC/hash in tsqueryrecv,
> instead of trusting the client.
Agreed.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2007-09-05 17:25:46 Re: HOT patch - version 15
Previous Message Kris Jurka 2007-09-05 17:05:23 Re: GSS warnings