|From:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Cc:||Robert Haas <robertmhaas(at)gmail(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, pgsql-hackers(at)postgresql(dot)org|
|Subject:||Re: english parser in text search: support for multiple words in the same position|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Sushant Sinha <sushant354(at)gmail(dot)com> writes:
> For the headline generation to work properly, email/file/url/host need
> to become skip tokens. Updating the patch with that change.
I looked at this patch a bit. I'm fairly unhappy that it seems to be
inventing a brand new mechanism to do something the ts parser can
already do. Why didn't you code the url-part mechanism using the
existing support for compound words? The changes made to parsetext()
seem particularly scary: it's not clear at all that that's not breaking
unrelated behaviors. In fact, the changes in the regression test
results suggest strongly to me that it *is* breaking things. Why are
there so many diffs in examples that include no URLs at all?
An issue that's nearly as bad is the 100% lack of documentation,
which makes the patch difficult to review because it's hard to tell
what it intends to accomplish or whether it's met the intent.
The patch is not committable without documentation anyway, but right
now I'm not sure it's even usefully reviewable.
In line with the lack of documentation, I would say that the choice of
the name "parttoken" for the new token type is not helpful. Part of
what? And none of the other token type names include the word "token",
so that's not a good decision either. Possibly "url_part" would be a
regards, tom lane
|Next Message||Tom Lane||2010-09-19 21:36:57||Re: pgxs docdir question|
|Previous Message||Devrim GÜNDÜZ||2010-09-19 21:25:00||pgxs docdir question|