Re: tsearch refactorings

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: tsearch refactorings
Date: 2007-09-05 15:55:22
Message-ID: 46DED16A.9000505@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Heikki, I see some strange changes in your patch, not related to tsearch at all:
contrib/pageinspect/pageinspect.sql.in
contrib/pageinspect/rawpage.c

> The usage of the QueryItem struct was very confusing. It was used for
> both operators and operands. For operators, "val" was a single character
> casted to a int4, marking the operator type. For operands, val was the
> CRC-32 of the value. Other fields were used only either for operands or
> for operators. The biggest change in the patch is that I broke the
> QueryItem struct into QueryOperator and QueryOperand. Type was really
...
> - Removed ParseQueryNode struct used internally by makepol and friends.
> push*-functions now construct QueryItems directly.

It's needed to set unused bytes in QueryItem to zero, it's common requiremens
for types in pgsql. After allocating space for tsquery in parse_tsquery you copy
just sizeof(QueryOperator) bytes and leave sizeof(QueryItem) -
sizeof(QueryOperator) bytes untouched. QueryOperand is a biggest component in
QueryItem union. I don't check other places.

> that? And parse_query always produces trees that are in prefix notation,
> so the left-field is really redundant, but using tsqueryrecv, you could
> inject queries that are not in prefix notation; is there anything in the
> code that depends on that?
It's used by TS_execute for optimization reason. With clear postfix notation you
should go through every nodes. For example:
FALSE FALSE & FALSE &
You will go to the end of query to produce correct result.
In fact, TSQuery is a prefix notation with pointer to another operand or, by
another words, just a plain view of tree where right operand of operation is
always placed after operation.
That notation allows to calculate only one of operand if it possible:
& FALSE & FALSE FALSE
1 2 3 4 5 --Nodes
After evaluating of second node you can return FALSE for whole expression and do
not evaluate nodes 3-5. For query
& TRUE & FALSE & FALSE
it's needed to evaluate 1,2,3,4 nodes. In most cases checking QI_VAL node is
much more expensive that QI_OPR

>
> - There's many internal intermediate representations of a query:
> TSQuery, a QTNode-tree, NODE-tree (in tsquery_cleanup.c), prefix
> notation stack of QueryItems (in parser), infix-tree. Could we remove
> some of these?
I havn't strong objections, QTNode and NODE are tree-like structures, but
TSQuery is a postfix notation for storage in plain memory. NODE is used only
cleanup stop-word placeholders, so it's a binary tree while QTNode represents
t-ary tree (with any number of children).

Thank you for your interesting in tsearch - after recheck of problem pointed
above I'll commit your patch.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Teodor Sigaev 2007-09-05 15:56:42 Re: tsearch refactorings
Previous Message Tom Lane 2007-09-05 15:24:47 Re: GSS warnings