Re: json api WIP patch

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Taras Mitran <taras(at)ivc(dot)com>, Joe Van Dyk <joe(at)tanga(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(dot)wheeler(at)pgexperts(dot)com>
Subject: Re: json api WIP patch
Date: 2013-01-15 21:23:27
Message-ID: CAAZKuFb6Ho7MgiBuDt4trCf_gqh6bChJO46WnUfhbegOJUn66g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 01/15/2013 02:47 PM, Merlin Moncure wrote:
>>
>> On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david(at)fetter(dot)org> wrote:
>>>
>>> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>>>>
>>>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>>>>>
>>>>> While testing this I noticed that integer based 'get' routines are
>>>>> zero based -- was this intentional? Virtually all other aspects of
>>>>> SQL are 1 based:
>>>>>
>>>>> postgres=# select json_get('[1,2,3]', 1);
>>>>> json_get
>>>>> ----------
>>>>> 2
>>>>> (1 row)
>>>>>
>>>>> postgres=# select json_get('[1,2,3]', 0);
>>>>> json_get
>>>>> ----------
>>>>> 1
>>>>> (1 row)
>>>>
>>>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>>>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>>>> arrays as zero-based. I suspect the Json-using community would not
>>>> thank us for being overly SQL-centric on this - and I say that as
>>>> someone who has always thought zero based arrays were a major design
>>>> mistake, responsible for countless off-by-one errors.
>>>
>>> Perhaps we could compromise by making arrays 0.5-based.
>>
>> Well, I'm not prepared to argue with Andrew in this one. It was
>> surprising behavior to me, but that's sample size one.
>
> I doubt I'm very representative either. People like David Wheeler, Taras
> Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than
> me. I'm quite prepared to change it if that's the consensus.

Hello.

I'm inclined to go with the same gut feeling you had (zero-based-indexing).

Here is the background for my reasoning:

The downside of zero-based-indexing is that people who want to use
multiple sequential container types will inevitably have to deal with
detailed and not easily type-checked integer coordinates that mean
different things in each domain that will, no doubt, lead to a number
of off-by-one errors. Nevertheless, this cost is already paid because
one of the first things many people will do in programs generating SQL
queries is try to zero-index a SQL array, swear a bit after figuring
things out (because a NULL will be generated, not an error), and then
adjust all the offsets. So, this is not a new problem. On many
occasions I'm sure this has caused off-by-one bugs, or the NULLs
slipped through testing and delivered funny results, yet the world
moves on.

On the other hand, the downside of going down the road of 1-based
indexing and attempting to attain relative sameness to SQL arrays, it
would also feel like one would be obliged to implement SQL array
infelicities like 'out of bounds' being SQL NULL rather than an error,
related to other spectres like non-rectangular nested arrays. SQL
array semantics are complex and The Committee can change them or --
slightly more likely -- add interactions, so it seems like a general
expectation that Postgres container types that happen to have any
reasonable ordinal addressing will implement some level of same-ness
with SQL arrays is a very messy one. As such, if it becomes customary
to implement one-based indexing of containers, I think such customs
are best carefully circumscribed so that attempts to be 'like' SQL
arrays are only as superficial as that.

What made me come down on the side of zero-based indexing in spite of
the weaknesses are these two reasons:

* The number of people who use JSON and zero-based-indexing is very
large, and furthermore, within that set the number that know that
SQL even defines array support -- much less that Postgres implements
it -- is much smaller. Thus, one is targeting cohesion with a fairly
alien concept that is not understood by the audience.

* Maintaining PL integrated code that uses both 1-based indexing in PG
functions and 0-based indexing in embedded languages that are likely
to be combined with JSON -- doesn't sound very palatable, and the
use of such PLs (e.g. plv8) seems pretty likely, too. That can
probably be a rich source of bugs and frustration.

If one wants SQL array semantics, it seems like the right way to get
them is coercion to a SQL array value. Then one will receive SQL
array semantics exactly.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-01-15 21:26:18 Re: unlogged tables vs. GIST
Previous Message Stephen Frost 2013-01-15 21:22:48 Re: [PATCH] COPY .. COMPRESSED