Re: english parser in text search: support for multiple words in the same position

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: sushant354(at)gmail(dot)com
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: english parser in text search: support for multiple words in the same position
Date: 2010-09-19 21:29:57
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sushant Sinha <sushant354(at)gmail(dot)com> writes:
> For the headline generation to work properly, email/file/url/host need
> to become skip tokens. Updating the patch with that change.

I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the ts parser can
already do. Why didn't you code the url-part mechanism using the
existing support for compound words? The changes made to parsetext()
seem particularly scary: it's not clear at all that that's not breaking
unrelated behaviors. In fact, the changes in the regression test
results suggest strongly to me that it *is* breaking things. Why are
there so many diffs in examples that include no URLs at all?

An issue that's nearly as bad is the 100% lack of documentation,
which makes the patch difficult to review because it's hard to tell
what it intends to accomplish or whether it's met the intent.
The patch is not committable without documentation anyway, but right
now I'm not sure it's even usefully reviewable.

In line with the lack of documentation, I would say that the choice of
the name "parttoken" for the new token type is not helpful. Part of
what? And none of the other token type names include the word "token",
so that's not a good decision either. Possibly "url_part" would be a
suitable name.

regards, tom lane

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-09-19 21:36:57 Re: pgxs docdir question
Previous Message Devrim GÜNDÜZ 2010-09-19 21:25:00 pgxs docdir question