lexeme ordering in tsvector

From: Sushant Sinha <sushant354(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: lexeme ordering in tsvector
Date: 2009-11-30 18:05:22
Message-ID: 1259604322.3191.7.camel@dragflick
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It seems like the ordering of lexemes in tsvector has changed from 8.3
to 8.4.

For example in 8.3.1,

postgres=# select to_tsvector('english', 'quit everytime');
to_tsvector
-----------------------
'quit':1 'everytim':2

The lexemes are arranged by length and then by string comparison.

In postgres 8.4.1,

select to_tsvector('english', 'quit everytime');
to_tsvector
-----------------------
'everytim':2 'quit':1

they are arranged by strncmp and then by length.

I looked in tsvector_op.c, in the function tsCompareString, first memcmp
and then length comparison is done.

Was this change in ordering deliberate?

Wouldn't length comparison be cheaper than memcmp?

-Sushant.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-11-30 18:15:06 A thought about regex versus multibyte character sets
Previous Message David E. Wheeler 2009-11-30 18:02:02 Re: [PATCH] hstore documentation update