Re: Tsvector editing functions

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tsvector editing functions
Date: 2016-01-27 16:39:29
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On 22 Jan 2016, at 19:03, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> OK, although I do recommend using more sensible variable names, i.e. why how to use 'lexemes' instead of 'lexarr' for example? Similarly for the other functions.

Changed. With old names I tried to follow conventions in surrounding code, but probably that is a good idea to switch to more meaningful names in new code.

>> delete(tsin tsvector, tsv_filter tsvector) — Delete lexemes and/or positions of tsv_filter from tsin. When lexeme in tsv_filter has no positions function will delete any occurrence of same lexeme in tsin. When tsv_filter lexeme have positions function will delete them from positions of matching lexeme in tsin. If after such removal resulting positions set is empty then function will delete that lexeme from resulting tsvector.
> I can't really imagine situation in which I'd need this, but if you do have a use case for it ... although in the initial paragraph you say "... but if somebody wants to delete for example ..." which suggests you may not have such use case.
> Based on bad experience with extending API based on vague ideas, I recommend only really adding functions with existing need. It's easy to add a function later, much more difficult to remove it or change the signature.

I tried to create more or less self-contained api, e.g. have ability to negate effect of concatenation. But i’ve also asked people around what they think about extending API and everybody convinced that it is better to stick to smaller API. So let’s drop it. At least that functions exists in mail list in case if somebody will google for such kind of behaviour.

>> Also if we want some level of completeness of API and taking into account that concat() function shift positions on second argument I thought that it can be useful to also add function that can shift all positions of specific value. This helps to undo concatenation: delete one of concatenating tsvectors and then shift positions in resulting tsvector. So I also wrote one another small function:
>> shift(tsin tsvector,offset int16) — Shift all positions in tsin by given offset
> That seems rather too low-level. Shouldn't it be really built into delete() directly somehow?

I think it is ambiguous task on delete. But if we are dropping support of delete(tsvector, tsvector) I don’t see points in keeping that functions.

>>> 7) Some of the functions use intexterm that does not match the function
>>> name. I see two such cases - to_tsvector and setweight. Is there a
>>> reason for that?
>> Because sgml compiler wants unique indexterm. Both functions that
>> youmentioned use overloading of arguments and have non-unique name.
> As Michael pointed out, that should probably be handled by using <primary> and <secondary> tags.


> On 19 Jan 2016, at 00:21, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> It's a bit funny that you reintroduce the "unrecognized weight: %d"
> (instead of %c) in tsvector_setweight_by_filter.

Ah, I was thinking about moving it to separate diff and messed. Fixed and attaching diff with same fix for old tsvector_setweight.

Attachment Content-Type Size
tsvector_ops-v2.1.diff application/octet-stream 36.3 KB
tsvector_ops-v2.2.diff application/octet-stream 448 bytes
unknown_filename text/plain 96 bytes

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2016-01-27 16:40:29 Re: Implementing a new Scripting Language
Previous Message Igal @ 2016-01-27 16:27:36 Implementing a new Scripting Language