Re: tsearch refactorings

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Cc: "Patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: tsearch refactorings
Date: 2007-09-07 09:53:16
Message-ID: 46E11F8C.5020903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Tom Lane wrote:
> Any portion of a binary value that is considered textual should be
> converted to and from client encoding --- cf textsend/textrecv.
> This should be pretty trivial to fix, just call a different support
> routine.

You do need to adjust length and position fields in the structs as well.
I fixed (rewrote, almost) the send/recv functions, and added a comment
above them describing the on-wire format. The CRC is now recalculated in
tsquery as well per previous discussion.

Patch attached. This is on top of the previous patches I sent. It
includes some additional changes that I had already started with. Most
notably:

- change the alignment requirement of lexemes in TSVector slightly.
Lexeme strings were always padded to 2-byte aligned length to make sure
that if there's position array (uint16[]) it has the right alignment.
The patch changes that so that the padding is not done when there's no
positions. That makes the storage of tsvectors without positions
slightly more compact.

- added some #include "miscadmin.h" lines I missed in the earlier when I
added calls to check_stack_depth().

BTW, the encoding of the XML datatype looks pretty funky. xml_recv first
reads the xml string with pq_getmsgtext, which applies a client->server
conversion. Then the xml declaration is parsed, extracting the encoding
attribute. Then the string is converted again from that encoding (or
UTF-8 if none was specified) to server encoding. I don't understand how
it's supposed to work, but ISTM there's one conversion too much,

> BTW, Teodor, are you intending to review/apply Heikki's tsearch fixes,
> or do you want someone else to do it?

I am getting confused with the patches and version I have lying around
here... I think I'll have to wait for review of the patches I've posted
this far before I continue hacking.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
tsearch-incr-2.patch text/x-diff 34.2 KB

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-09-07 11:02:20 XML binary I/O (was Re: tsearch refactorings)
Previous Message Greg Smith 2007-09-07 04:35:06 Re: HOT patch - version 15