From: | Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> |
---|---|
To: | grinnz(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #15277: ts_headline strips things that look like HTML tags and it cannot be disabled |
Date: | 2018-07-12 09:22:06 |
Message-ID: | 20180712092205.GA16177@zakirov.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
On Thu, Jul 12, 2018 at 07:59:40AM +0000, PG Bug reporting form wrote:
> I have text that is not HTML and contains things that look like HTML tags.
> The headlines are HTML escaped when output. It is very odd to have this text
> missing from the resulting headlines and no way to control the behavior.
<b> and </b> are recognized as "tag" token. By default they are
ignored. You need to modify existing configuration or create new one:
=# CREATE TEXT SEARCH CONFIGURATION english_tag (COPY = english);
=# alter text search configuration english_tag
add mapping for tag with simple;
Then tags aren't skipped:
=# select * from ts_debug('english_tag', 'query <b>test</b>');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+-------+----------------+--------------+---------
asciiword | Word, all ASCII | query | {english_stem} | english_stem | {queri}
blank | Space symbols | | {} | (null) | (null)
tag | XML tag | <b> | {simple} | simple | {<b>}
asciiword | Word, all ASCII | test | {english_stem} | english_stem | {test}
tag | XML tag | </b> | {simple} | simple | {</b>}
But even in this case ts_headline will skip tags. Because it is
hardcoded [1].
I think it isn't good to change the behaviour for existing versions of
PostgreSQL. But there is a workaround of course if it is appropriate for
someone. It is possible to create your own text search parser extension.
Example [2]. And change
#define HLIDREPLACE(x) ( (x)==TAG_T )
to
#define HLIDREPLACE(x) ( false )
1 - https://github.com/postgres/postgres/blob/master/src/backend/tsearch/wparser_def.c#L1923
2 - https://github.com/postgrespro/pg_tsparser
--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Dan Book | 2018-07-12 15:33:52 | Re: BUG #15277: ts_headline strips things that look like HTML tags and it cannot be disabled |
Previous Message | Amit Langote | 2018-07-12 08:59:02 | Re: BUG #15212: Default values in partition tables don't work as expected and allow NOT NULL violation |