Quick Links

Html parsing and inline elements

From:	Marcelo Zabani <mzabani(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Html parsing and inline elements
Date:	2016-04-13 13:44:57
Message-ID:	CACgY3QZ0_TX4LBC8=RRCRGM2Mgos6S8jj8AhxYMP6P5EM2M4yQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi everyone,

I was here wondering whether HTML parsing should separate tokens that are
not separated by spaces in the original text, but are separated by an
inline element. Let me show you an example:

*SELECT to_tsvector('english', 'Helloneighbor, you are
nice')*
*Results:** "'ce':7 'hello':1 'n':5 'neighbor':2"*

"Hello" and "neighbor" should really be separated, because ** is a block
element, but "nice" should be a single word there, since there is no visual
separation when rendered (** and ** are inline elements).

Sorry if this has been asked before, but I couldn't find it anywhere.

Thanks in advance,
Marcelo.

Responses

Re: Html parsing and inline elements at 2016-04-13 14:09:49 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2016-04-13 13:46:09	Re: Missing PG_INT32_MIN in numutils.c
Previous Message	Tom Lane	2016-04-13 13:38:21	Re: Missing PG_INT32_MIN in numutils.c