From: | Dan Book <grinnz(at)gmail(dot)com> |
---|---|
To: | a(dot)zakirov(at)postgrespro(dot)ru |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15277: ts_headline strips things that look like HTML tags and it cannot be disabled |
Date: | 2018-07-12 15:33:52 |
Message-ID: | CABMkAVUjc7Bh4WWTnF_US95_t8L6hpPFV8yJQJ51YQmWjG=Spg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Jul 12, 2018 at 5:22 AM Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
wrote:
> Hello,
>
> On Thu, Jul 12, 2018 at 07:59:40AM +0000, PG Bug reporting form wrote:
> > I have text that is not HTML and contains things that look like HTML
> tags.
> > The headlines are HTML escaped when output. It is very odd to have this
> text
> > missing from the resulting headlines and no way to control the behavior.
>
> <b> and </b> are recognized as "tag" token. By default they are
> ignored. You need to modify existing configuration or create new one:
>
> =# CREATE TEXT SEARCH CONFIGURATION english_tag (COPY = english);
> =# alter text search configuration english_tag
> add mapping for tag with simple;
>
> Then tags aren't skipped:
>
> =# select * from ts_debug('english_tag', 'query <b>test</b>');
> alias | description | token | dictionaries | dictionary |
> lexemes
>
> -----------+-----------------+-------+----------------+--------------+---------
> asciiword | Word, all ASCII | query | {english_stem} | english_stem |
> {queri}
> blank | Space symbols | | {} | (null) |
> (null)
> tag | XML tag | <b> | {simple} | simple |
> {<b>}
> asciiword | Word, all ASCII | test | {english_stem} | english_stem |
> {test}
> tag | XML tag | </b> | {simple} | simple |
> {</b>}
>
> But even in this case ts_headline will skip tags. Because it is
> hardcoded [1].
>
> I think it isn't good to change the behaviour for existing versions of
> PostgreSQL. But there is a workaround of course if it is appropriate for
> someone. It is possible to create your own text search parser extension.
> Example [2]. And change
>
> #define HLIDREPLACE(x) ( (x)==TAG_T )
>
> to
>
> #define HLIDREPLACE(x) ( false )
>
Thanks for the response. It's good to know this is possible but defining a
custom parser is not ideal.
-Dan
From | Date | Subject | |
---|---|---|---|
Next Message | Moshe Jacobson | 2018-07-12 19:48:04 | pg_restore: All GRANTs on table fail when any one role is missing |
Previous Message | Arthur Zakirov | 2018-07-12 09:22:06 | Re: BUG #15277: ts_headline strips things that look like HTML tags and it cannot be disabled |