Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
Cc: bvimalvictor(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments
Date: 2019-08-21 13:45:13
Message-ID: 29036.1566395113@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> writes:
> I don't think it's a bug. I think it's just that the SUBSTR in
> PostgreSQL is more SQL compliant than any other databases that you
> mentioned. As per the SQL standard, if the start position is zero or
> negative number, it should be adjusted to the start of the string and
> not to the end of the string and that way I feel the behaviour of
> SUBSTR in PostgreSQL is more SQL compliant than other databases.

Yes, the standard provides no wiggle room here. The behavior of
substring() with integer parameters, as specified in SQL:2008
6.29 <string value function> general rule 3, is

3) If <character substring function> is specified, then:

a) If the character encoding form of <character value expression> is
UTF8, UTF16, or UTF32, then, in the remainder of this General Rule, the
term “character” shall be taken to mean “unit specified by <char length
units>”.

b) Let C be the value of the <character value expression>, let LC be the
length in characters of C, and let S be the value of the <start
position>.

c) If <string length> is specified, then let L be the value of <string
length> and let E be S+L. Otherwise, let E be the larger of LC + 1 and
S.

d) If either C, S, or L is the null value, then the result of the
<character substring function> is the null value.

e) If E is less than S, then an exception condition is raised: data
exception — substring error.

f) Case:

i) If S is greater than LC or if E is less than 1 (one), then the
result of the <character substring function> is a zero-length string.

ii) Otherwise,

1) Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E
and LC+1. Let L1 be E1–S1.

2) The result of the <character substring function> is a character
string containing the L1 characters of C starting at character
number S1 in the same order that the characters appear in C.

I believe our implementation does this exactly.

Even if it were true that Oracle's behavior is more common than the
spec's definition, it's quite unlikely that we could be talked into
abandoning spec-compliant behavior to match Oracle.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2019-08-21 13:49:54 Re: BUG #15964: vacuumdb.c:187:10: error: use of undeclared identifier 'FD_SETSIZE'
Previous Message Michael Paquier 2019-08-21 13:44:34 Re: BUG #15954: Unable to alter partitioned table to set logged