Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: bvimalvictor(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15971: Behaviour of SUBSTR function depending on its arguments
Date: 2019-08-21 14:07:09
Message-ID: CAE9k0Pksw==KG3JkBmwSSdmmHotJEUywryqs0L+15kcY_aO6_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Aug 21, 2019 at 7:15 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> writes:
> > I don't think it's a bug. I think it's just that the SUBSTR in
> > PostgreSQL is more SQL compliant than any other databases that you
> > mentioned. As per the SQL standard, if the start position is zero or
> > negative number, it should be adjusted to the start of the string and
> > not to the end of the string and that way I feel the behaviour of
> > SUBSTR in PostgreSQL is more SQL compliant than other databases.
>
> Yes, the standard provides no wiggle room here. The behavior of
> substring() with integer parameters, as specified in SQL:2008
> 6.29 <string value function> general rule 3, is
>
> 3) If <character substring function> is specified, then:
>
> a) If the character encoding form of <character value expression> is
> UTF8, UTF16, or UTF32, then, in the remainder of this General Rule, the
> term “character” shall be taken to mean “unit specified by <char length
> units>”.
>
> b) Let C be the value of the <character value expression>, let LC be the
> length in characters of C, and let S be the value of the <start
> position>.
>
> c) If <string length> is specified, then let L be the value of <string
> length> and let E be S+L. Otherwise, let E be the larger of LC + 1 and
> S.
>
> d) If either C, S, or L is the null value, then the result of the
> <character substring function> is the null value.
>
> e) If E is less than S, then an exception condition is raised: data
> exception — substring error.
>
> f) Case:
>
> i) If S is greater than LC or if E is less than 1 (one), then the
> result of the <character substring function> is a zero-length string.
>
> ii) Otherwise,
>
> 1) Let S1 be the larger of S and 1 (one). Let E1 be the smaller of E
> and LC+1. Let L1 be E1–S1.
>
> 2) The result of the <character substring function> is a character
> string containing the L1 characters of C starting at character
> number S1 in the same order that the characters appear in C.
>
> I believe our implementation does this exactly.
>
> Even if it were true that Oracle's behavior is more common than the
> spec's definition, it's quite unlikely that we could be talked into
> abandoning spec-compliant behavior to match Oracle.
>

Absolutely. We shouldn't do that. We are more accurate than Oracle as
per the SQL standard.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2019-08-21 14:16:36 Re: BUG #15964: vacuumdb.c:187:10: error: use of undeclared identifier 'FD_SETSIZE'
Previous Message Michael Paquier 2019-08-21 14:02:46 Re: BUG #15964: vacuumdb.c:187:10: error: use of undeclared identifier 'FD_SETSIZE'