Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Marek Lewczuk <marek(at)lewczuk(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5075: Text Search parser does not identify xml tag when attribute name's contains underscore
Date: 2009-11-15 13:56:05
Message-ID: 1258293365.14314.28.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On ons, 2009-09-23 at 20:31 -0300, Euler Taveira de Oliveira wrote:
> Marek Lewczuk escreveu:
> > Please execute following example:
> > select * from ts_debug('english', '<img width="182" height="120"
> > align="right" style="margin: 0px 0px 5px 5px;" test_aa="26461"/>')
> >
> > As the result you will see, that <img/> is not identified as XML tag, but
> > rather splitted as words, blank spaces etc. The reason for that is the fact,
> > that last attribute "test_aa" contains underscore in its name - when the
> > underscore is removed, then img tag is properly identified as XML tag.
> >
> > XML definition allows using underscore in tag and attribute names.
> >
> The problem is we already allow it in tag names but not in attribute names. So
> the proper fix is to allow underscore when the state is TPS_InTag; according
> to XML spec [1], the underscore is a valid character in attribute names.
>
> A possible downside is that we don't have underscores in HTML attribute names.
> In this case, should it fail? I don't think so but...
>
> The problem exists in 8.3, 8.4 and HEAD. It is a trivial fix so I think there
> isn't a problem to back-patch it.

Fix committed to 8.3, 8.4, 8.5.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-11-15 16:02:10 Re: BUG #5188: Inheritance: bug or feature?
Previous Message Brice Maron 2009-11-15 12:36:31 BUG #5189: Postgres Crash when trying to ts_stat a empty tsvector