Re: tsvector string representation and parsing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Johannes Graën <johannes(at)selfnet(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: tsvector string representation and parsing
Date: 2022-02-23 22:30:11
Message-ID: 1259438.1645655411@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

=?UTF-8?Q?Johannes_Gra=c3=abn?= <johannes(at)selfnet(dot)de> writes:
> This is a minimal example that goes wrong (but shouldn't IMHO:

>> SELECT format('%L:1', '\:')::tsvector

format(%L) is designed to produce a SQL literal, which does not
have the same requirements as a tsvector element ... yeah, they're
close, but not close enough. In this particular example,
what you get is

=# SELECT format('%L:1', '\:');
format
----------
E'\\:':1
(1 row)

because format() adds an E prefix for the avoidance of doubt about
what to do with the backslashes. tsvector doesn't like that.

I don't think we have any prefab function that does what you're
looking for here, and TBH I'm not sure I see the point of it.
Pretty much any tsvector you'd be dealing with in practice is
going to have come from one of the to_tsvector family of
functions, and those tend to drop punctuation.

> My understanding is that everything inside
> single quotes is taken as a lexeme. From the documentation:

>> To represent lexemes containing whitespace or punctuation, surround them with quotes

See also the next bit about having to double quotes and backslashes
within those quotes. So what you'd actually need is

=# select $$'\\:':1$$::tsvector;
tsvector
----------
'\\:':1
(1 row)

If you write just one backslash, it has the effect of quoting
the next character, which in this case doesn't need quoting.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Johannes Graën 2022-02-23 23:41:01 Re: tsvector string representation and parsing
Previous Message Johannes Graën 2022-02-23 21:15:48 tsvector string representation and parsing