[PATCH] Support negative indexes in split_part

From: Nikhil Benesch <nikhil(dot)benesch(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [PATCH] Support negative indexes in split_part
Date: 2020-11-02 02:39:11
Message-ID: cbb7f861-6162-3a51-9823-97bc3aa0b638@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I posted the idea of allowing negative indexes in split_part to pgsql-general
last week, and it seemed there was some interest:

http://postgr.es/m/CAPWqQZR%2B-5pAZNSSrnmYczRaX-huemc%3DoO8URvDZvUA-M%3DMOBA%40mail.gmail.com

Attached is a patch, based on master, that implements the approach as described
in that discussion.

The motivation is that the existing idioms for splitting a string and selecting
the nth-to-last element are rather complicated and/or inefficient:

1. (string_to_array('foo bar baz', ' '))[cardinality(string_to_array('foo bar baz', ' ')) - 1]
2. reverse(split_part(reverse('foo bar baz'), ' ', 1))
3. (regexp_match('foo baz bar', '\S*$'))[1]

With the patch, split_part(haystack, needle, -1) selects the last field of the
string, split_part(haystack, needle, -2) selects the second-to-last field, and
so on. Per Tom Lane, there is precedent for this design, where negative indices
meaning "count from the end", namely the left and right string functions.

The patch includes updates to the docs and regression tests. If the feature is
deemed desirable, I believe the patch is "commit quality" (though, fair warning,
this is my first contribution to Postgres, so I might have the wrong notion
of what a committable patch looks like).

Note that the implementation is deliberately a bit inefficient to keep things
simple. When presented with a negative index, the implementation does an extra
pass over the string to count the total number of fields, in order to convert
the negative index to a positive index. Then it proceeds as it normally would.

One can imagine adding support for backwards B-M-H, but I'm not sure that could
be made to work with all multibyte encodings. We could at least avoid the extra
pass over the string by allocating a circular buffer of size |n| when n is
negative, but that wasn't clearly worthwhile. (I did implement the optimization
for the special case of -1, since its implementation was trivial.)

Cheers,
Nikhil

Attachment Content-Type Size
0001-Support-negative-indexes-in-split_part.patch text/plain 7.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-02 02:56:03 Re: [PATCH] Support negative indexes in split_part
Previous Message Kyotaro Horiguchi 2020-11-02 02:19:09 Re: Explicit NULL dereference (src/backend/utils/adt/ruleutils.c)