Re: Re: [GENERAL] Text search parser's treatment of URLs and emails

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Thom Brown <thom(at)linux(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [GENERAL] Text search parser's treatment of URLs and emails
Date: 2010-10-12 23:31:30
Message-ID: 10038.1286926290@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> [ sent to hackers where it belongs ]
> Thom Brown wrote:
>> It could be me being picky, but I don't regard parameters or page
>> fragments as part of the URL path.

> Wow, that is a tough one. One the one hand, it seems nice to be able to
> split stuff out more, but on the other hand we would be making url_path
> less useful because people would need to piece things together to get
> the old behavior. In fact to piece things together we would need to add
> '?' and '#' optionally, which seems kind of hard. Perhaps we should
> keep url_path unchanged and add file_path that has your suggestion.

This seems much of a piece with the existing proposal to allow
individual "words" of a URL to be reported separately:
https://commitfest.postgresql.org/action/patch_view?id=378

As I said in that thread, this could be done in a backwards-compatible
way using the tsearch parser's existing ability to report multiple
overlapping tokens out of the same piece of text. But I'd like to see
one unified proposal and patch for this and Sushant's patch, not
independent hacks changing the behavior in the same area.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2010-10-12 23:33:44 Re: Review: Fix snapshot taking inconsistencies
Previous Message Neil Whelchel 2010-10-12 23:19:33 Re: Slow count(*) again...