Re: SQLJSON

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Christopher BROWN <brown(at)reflexe(dot)fr>
Cc: Markus KARG <markus(at)headcrashing(dot)eu>, List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: SQLJSON
Date: 2015-06-29 21:21:26
Message-ID: CADK3HHLr7N=SOmOeHJf2Ak_s5=e1HvHAOR-JX9vv-ckPwMg6cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On 29 June 2015 at 17:19, Christopher BROWN <brown(at)reflexe(dot)fr> wrote:

> Hello,
>
> In summary, is this what is planned ?
>
> - Bundle the JSONP API with the driver (specific version)
> - Bundle the JSONP RI with the driver (specific version, with modified
> package names)
> - Override getObject() to return a "JSON DOM" (buffering data once,
> then reconstructing it again as an object model)
>
> Is that correct? Could you remind me what does getObject() currently
> return for JSONB (I don't have a test case to hand)?
>

That is the current proposal, yes. I would expect getObject to return a
string at this point. But I haven't tested it either

Dave

>
> Thanks,
> Christopher
>
>
> On 29 June 2015 at 22:58, Dave Cramer <pg(at)fastcrypt(dot)com> wrote:
>
>> On 29 June 2015 at 16:22, Markus KARG <markus(at)headcrashing(dot)eu> wrote:
>>
>>> Dave,
>>>
>>>
>>>
>>> I understand that people expect that getObject does something reasonable
>>> - but the case why this does not happen is because PostgreSQL does not
>>> provide something reasonable AFAIK, or is there JSON support in the
>>> protocol (I might have missed that, actually, then please just tell me)?
>>>
>>>
>> Well things in PostgreSQL move at rather a glacial pace as far as
>> features go. The emphasis is on reliability. If you think this mailing list
>> is contentious you might want to have a look at hackers. To answer your
>> question though there is no special backend support for JSON, or any other
>> streaming. Yes this is a performance drawback, but it is what it is.
>>
>>>
>>>
>>> But I need to tell you that you have to deal with the risk that if JDBC
>>> 5 comes up with the definition "getObject has to return FancyJsonType"
>>> you'll face a backwards compatibility problem when you decided for pretty
>>> anything to return but not "FancyJsonType", whether the type you've choosen
>>> looked obvious to the common user, to almost any user or to nobody at all.
>>> Anyways, you're the boss, and I trust in you to do the right choice. :-)
>>>
>>
>> Doing nothing is surely the wrong choice. The question is what to do. I
>> don't pretend to know the right answer here.However in the absence of the
>> spec, I'm sure whatever we do will be a compromise.
>>
>>>
>>>
>>> Regarding your metaphor, please in turn understand thar some guys in
>>> this form, like Vladimir and me, are professional combustion engine
>>> engineers in our main job. Our whole life is dedicated to the difference
>>> between a vanilla BMW 318 into an M3. I hope there is place for such
>>> engineers in your team, too, and it is _not_really_ your attitude that you
>>> are happy once the weels don't fall away once you start the motor. We do
>>> not do it for thank or cheer, we simply do it because it is the attitude of
>>> our main jobs. We cannot forget our knowledge about compression and
>>> consumption just because 99% of the users don't care for. And part of this
>>> knowledge is thinking of efficiency and safety and cost of production and
>>> unassembly for recycling, and usefulness and maintainability and
>>> testability of the car, too, even if no user _ever_ would think about when
>>> sitting is his ride. If our demands on the professionality and performance
>>> of the end product is not what you like to achieve and you like to simply
>>> replace the pipe with a chromed one, then please let us know. Otherwise our
>>> PRs will one day produce a stack overflow on GitHub, which nobody wants. ;-)
>>>
>>
>> As I said in my earlier email I am grateful for your contributions and I
>> would love them to continue, but we need to keep the majority of the users
>> in mind. We also need to make sure in our quest for performance that we
>> don't break anything.
>>
>> Regards,
>>
>> Dave
>>
>>>
>>>
>>> Have fun
>>>
>>> -Markus
>>>
>>>
>>>
>>>
>>>
>>> *From:* pgsql-jdbc-owner(at)postgresql(dot)org [mailto:
>>> pgsql-jdbc-owner(at)postgresql(dot)org] *On Behalf Of *Dave Cramer
>>> *Sent:* Montag, 29. Juni 2015 21:23
>>> *To:* Álvaro Hernández Tortosa
>>> *Cc:* List
>>> *Subject:* Re: [JDBC] SQLJSON
>>>
>>>
>>>
>>> Markus,
>>>
>>>
>>>
>>> I really value your recent input to the driver and would like it to
>>> continue but we need to keep things in perspective. The mere fact that you
>>> are engaging in discussion on this list puts you in the 99.99999th (keep
>>> going with 9's if you like) percentile of users. Most and I mean the rest
>>> of the people in that population just expect the driver to do something
>>> reasonable. They don't provide JSON parsers or even know there is a
>>> difference. When they do getObject and that object is a json value they
>>> expect something magical to happen. We return an object they can use.
>>>
>>>
>>>
>>> The other less than exciting fact about the driver is that it is not the
>>> central feature of PostgreSQL. Allow me a metaphor if you will. I like cars
>>> so I'll chose to use a driveshaft. It connects the engine to the wheels.
>>> There is nothing new about driveshafts, although recently they have been
>>> made out of carbon fibre, but essentially they are tubes which connect the
>>> crankshaft to the wheels. To continue the metaphor, when the driver applies
>>> the gas pedal they expect the car to move. If the driveshaft breaks then
>>> this doesn't happen. Our job is to make sure the driveshaft continues to
>>> connect the engine to the wheels. If we can make it out of carbon fibre so
>>> much the better, but it is a particularly thankless job.
>>>
>>>
>>>
>>>
>>>
>>> Kind Regards,
>>>
>>>
>>> Dave Cramer
>>>
>>> dave.cramer(at)credativ(dot)ca
>>> http://www.credativ.ca
>>>
>>>
>>>
>>> On 29 June 2015 at 15:01, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
>>> wrote:
>>>
>>>
>>> Markus:
>>>
>>> On 29/06/15 18:26, Markus KARG wrote:
>>>
>>> Unfortunately, yes, it is weird to ask, as what you do actually is
>>> working already once you accept the technical reality that an application
>>> either has to use JSON SQL functions OR provide an implementation of the
>>> JSONP API to be able to process JsonObject.
>>>
>>>
>>> It was rhetorical. Precisely I'm saying let's include an
>>> implementation so that you could process a JsonValue. Of course.
>>>
>>>
>>> I actually cannot see what is so specific to PostgreSQL that pgjdbc has
>>> to support a use case that no other driver supports.
>>>
>>>
>>> Maybe that PostgreSQL has JSON, really good one, while others don't?
>>> Or let's ruin all the work done into JSON just because you fear of an
>>> absolutely unrealistic class path? Really? Please....
>>>
>>>
>>>
>>>
>>> The argument against it is that it provides classpath clashes once
>>> pgjdbc is used in a Java EE environment, and that our solution might be
>>> incompatible with JDBC 5, and that our solution imposes work and complexity
>>> on the pgjdbc contributors just for the sake to make you happy, without
>>> providing you any measurable benefit.
>>>
>>>
>>> Markus, when you say "to make you happy".... please take back your
>>> words and stick to rational arguments. If you cannot, please abandon this
>>> otherwise respectful and enriching debate.
>>>
>>> I'm speaking for all the users, all of them that want to use 9.4's
>>> best thing (with logical decoding permission, of course).
>>>
>>> Yet you only want to stop everything speaking of a class path that
>>> very likely no one will ever experience, or some fears about JDBC5 which
>>> might be light years ahead. We will worry about JDBC5 then, but now we need
>>> to help our users, not help them (with your help) to go to MongoDB.
>>>
>>>
>>>
>>>
>>> Álvaro
>>>
>>>
>>>
>>> --
>>>
>>> Álvaro Hernández Tortosa
>>>
>>>
>>>
>>>
>>>
>>> -----------
>>>
>>> 8Kdata
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -Markus
>>>
>>>
>>>
>>> *From:* pgsql-jdbc-owner(at)postgresql(dot)org [
>>> mailto:pgsql-jdbc-owner(at)postgresql(dot)org <pgsql-jdbc-owner(at)postgresql(dot)org>]
>>> *On Behalf Of *Álvaro Hernández Tortosa
>>> *Sent:* Sonntag, 28. Juni 2015 21:06
>>> *To:* pgsql-jdbc(at)postgresql(dot)org
>>> *Subject:* Re: [JDBC] SQLJSON
>>>
>>>
>>>
>>>
>>>
>>> On 28/06/15 17:09, Markus KARG wrote:
>>>
>>> If your application does not want to deal with JSON data then don't
>>> request it from the database, but use
>>> http://www.postgresql.org/docs/9.4/static/functions-json.html in your
>>> SELECT statement. Performing a JSON extraction function natively on the
>>> server and simply use the existing data types in your application not only
>>> reliefs us from dealing with JSON in the driver, but also will work
>>> magnitudes faster particularly on congested networks. That does work by
>>> default already in the existing driver. What you ask for does not provide
>>> any benefit to the user from my view, or I still do not understand you
>>> exact scenario.
>>>
>>>
>>> Markus, here you are trying to tell users what to do. I prefer to
>>> give them freedom, works best.
>>>
>>> All I'm saying is:
>>>
>>> - Give users an API to get a javax.json.JsonValue out of a PostgreSQL
>>> column which is of type json, jsonb or the result of an expression which
>>> evaluates to any of those.
>>>
>>> - Embed a JSON parser in the driver (like the RI implementation, which
>>> adds only 64Kb) so that users don't need to load any other code, unless
>>> they want to override the default JSON parser.
>>>
>>> From there, I don't care whether the user uses JSON functions at the
>>> server or the JSON API. What I want is that it works by default and that
>>> you can query JSON. Is this that weird to ask?
>>>
>>> I'm sorry, I fail to understand your solution to the problem. Would
>>> you please mind elaborating on the drawbacks of my proposal? I stated in
>>> previous emails the drawbacks I see on not doing it, but I still fail to
>>> see an argument against this.
>>>
>>> Thanks,
>>>
>>> Álvaro
>>>
>>>
>>> --
>>>
>>> Álvaro Hernández Tortosa
>>>
>>>
>>>
>>>
>>>
>>> -----------
>>>
>>> 8Kdata
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* pgsql-jdbc-owner(at)postgresql(dot)org [
>>> mailto:pgsql-jdbc-owner(at)postgresql(dot)org <pgsql-jdbc-owner(at)postgresql(dot)org>]
>>> *On Behalf Of *Álvaro Hernández Tortosa
>>> *Sent:* Sonntag, 28. Juni 2015 11:57
>>> *To:* pgsql-jdbc(at)postgresql(dot)org
>>> *Subject:* Re: [JDBC] SQLJSON
>>>
>>>
>>>
>>>
>>>
>>> On 28/06/15 11:49, Markus KARG wrote:
>>>
>>> You miss one essential point against bundling: An application author who
>>> wants to process JsonObject must have that class on his classpath, even if
>>> he does not use JDBC at all. This essential is HIS problem as HE wants to
>>> process it. So where does he get that class from other than putting any JSR
>>> 253 implementation in his classpath? Again, it is simply not pgjdbc's
>>> problem as in that scenario no JDBC is used at all.
>>>
>>> I don't agree, it's not a must. I may want to extract JSON data from
>>> the database and then manipulate it as non-JSON data. It all depends on
>>> your domain Objects.
>>>
>>> I already stated how bad for the user is not to have a driver that
>>> works by default. It may be as hard as adding a 64Kb to the driver. I don't
>>> understand how is this a problem, vs. the problem it creates for the user.
>>>
>>> Regards,
>>>
>>> Álvaro
>>>
>>>
>>>
>>> --
>>>
>>> Álvaro Hernández Tortosa
>>>
>>>
>>>
>>>
>>>
>>> -----------
>>>
>>> 8Kdata
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Álvaro Hernández Tortosa [mailto:aht(at)8Kdata(dot)com <aht(at)8Kdata(dot)com>]
>>>
>>> *Sent:* Sonntag, 28. Juni 2015 11:41
>>> *To:* Christopher BROWN
>>> *Cc:* Sehrope Sarkuni; Markus KARG; Dave Cramer; List
>>> *Subject:* Re: [JDBC] SQLJSON
>>>
>>>
>>>
>>>
>>>
>>> On 28/06/15 09:34, Christopher BROWN wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> Quick contribution, I'm not answering in-line because there are already
>>> too many in-line answers and it's getting unreadable.
>>>
>>> - In my own applications, I use Jackson, but it's many up of
>>> different ".jar" files and has therefore no definitive form (you can
>>> concoct lots of combinations). It's also quite heavy in terms of
>>> footprint, and embedding it makes no sense, because you'd have to keep
>>> updating the driver to keep up to date with Jackson. Finally, it doesn't
>>> actually implement JSR353 (although it would be possible to create a thin
>>> wrapper), out-of-the-box (via a compatibility API) it can read JSR-353 but
>>> it basically rebuilds a Jackson representation out of a "standard"
>>> representation. I might choose Jackson, but I wouldn't want to impose it
>>> or require that it be bundled with the driver (indeed, that would cause me
>>> classloader issues as I often update to the latest version of Jackson).
>>>
>>> Although I mentioned Jackson as a good candidate for a default
>>> implementation, you are right it does not implement JSR353 directly
>>> (although wrappers already exist, like
>>> https://github.com/pgelinas/jackson-javax-json). But it does not need
>>> to be the default choice. I did a quick test and wrapping Jackson with
>>> jaackson-javax-json and the set of dependencies to make it work would add
>>> 1102Kb to the Jar file. Not much IMHO, although bigger than current driver
>>> size. I would not be scared however to see a 2Mb jdbc jar file.
>>>
>>> However, the Reference Implementation (https://jsonp.java.net/) is
>>> probably good enough and only adds 64Kb to the Jar file. The JSR353 is just
>>> another 32Kb, so in total 96Kb would be added if using the RI rather than
>>> Jackson. I don't consider this offensive.
>>>
>>> In summary: why not bundle then the RI? Works out of the box and
>>> does not conflict with Jackson. Want to use Jackson? Go for it. The rest of
>>> the world would have something running out-of-the-box.
>>>
>>>
>>>
>>> - You can compile the driver against the JSONP API without embedding
>>> either the interfaces or an implementation. It's therefore an optional
>>> feature for those that require it, and it's not rocket science to add the
>>> necessary APIs to the classpath.
>>> - I disagree that bundling interfaces + implementation is "making it
>>> easy". For some users, perhaps, but for others, you're going to cause
>>> headaches due to creating classloader conflicts (when it's already bundled
>>> in their application).
>>>
>>> Technically, it's possible. But from a user perspective, not
>>> bundling an implementation means:
>>>
>>> - Understanding why the code I wrote fails with a "Provider
>>> org.glassfish.json.JsonProviderImpl not found".
>>> - Google that. Find that you need to add the dependency and very likely
>>> create a SPI file (META-INF/services/javax.json.spi.JsonProvider).
>>> - Blame PostgreSQL for not doing that by default and shipping a
>>> half-baked driver that is not finished (won't be my opinion, but might be
>>> user's opinion).
>>> - Google again to see what JSR353-compliant implementations are out
>>> there.
>>> - Blame PostgreSQL again for not making this choice for you.
>>> - Struggling to find a compatible implementation. Find Jackson but
>>> realize requires third-party driver. Question the "quality" of that wrapper
>>> and consider whether that would be "supported" with PostgreSQL driver.
>>> - Luckily enough a search points you to a StackOverflow link that
>>> suggests to use either this Jackson wrapper or the Reference Implementation
>>> (there are not many JSR353 implementations, after all).
>>> - Choose without knowing which JSON parser is best.
>>> - Bundle the dependency, check now it works. Clean your sweat.
>>> - Wonder why not to choose MongoDB next time, it works out of the box.
>>>
>>> Not funny.
>>>
>>> The alternative is: bundle a default (the RI) and let advanced users
>>> not happy with the default implementation to make another choice (basically
>>> create the META-INF/services/javax.json.spi.JsonProvider and that will
>>> override the bundled RI). You have the best of both worlds.
>>>
>>> Pareto (https://en.wikipedia.org/wiki/Pareto_principle): let's make
>>> it work for 80% and let that 20% to edit a file and "suffer" a 64Kb size
>>> increase in the driver, rather than ask everybody to go through the above
>>> process.
>>>
>>>
>>>
>>>
>>> - If as Dave Cramer says, the PG protocol doesn't currently support
>>> streaming, it still makes sense to add streaming support that reads from
>>> already fully-read resources... because in that way, if the protocol
>>> improves in the future, client code using the streaming API will benefit
>>> (with no changes to client code) in the future.
>>>
>>> JSR353 already has a Streaming API. I don't understand what do we
>>> need to do here, it's already done.
>>>
>>> Best regards,
>>>
>>> Álvaro
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Álvaro Hernández Tortosa
>>>
>>>
>>>
>>>
>>>
>>> -----------
>>>
>>> 8Kdata
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Christopher
>>>
>>>
>>>
>>>
>>>
>>> On 28 June 2015 at 01:53, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
>>> wrote:
>>>
>>>
>>>
>>> On 28/06/15 00:55, Sehrope Sarkuni wrote:
>>>
>>> On Sat, Jun 27, 2015 at 6:25 PM, Álvaro Hernández Tortosa <
>>> aht(at)8kdata(dot)com> wrote:
>>>
>>> Hi Sehrope!
>>>
>>>
>>>
>>> Hi Álvaro! :D
>>>
>>>
>>>
>>> To me, this is the least important question. If based on JSR353's
>>> SPI, it's trivial to swap the default, included one, for another one. Just
>>> picking a sensible default (Jackson, for instance) is probably good enough.
>>>
>>>
>>>
>>> I think I've used Jackson almost every time I've had to deal with JSON
>>> in Java. The mapping API is pretty cool in that it lets you directly create
>>> an target object type. If we got the route of adding methods to PGResultSet
>>> then we could have something like: <T> T getJsonAsType(String, Class<T>
>>> clazz)
>>>
>>>
>>> That might be a nice addition. But I believe that goes beyond
>>> driver's responsibility: I think it ends when it returns you the JSON type
>>> you queried (JsonObject in my previous email, but I'm correcting now
>>> myself: JsonValue)
>>>
>>>
>>>
>>>
>>>
>>> I'm not wedded to Jackson though. Honestly if JS353 is the standard then
>>> that's what we should be using. We'd still need to figure out how to handle
>>> older JVMs or maybe just selectively disable the feature (JDK8+?).
>>>
>>>
>>> JSR353 is targeted for JavaSE 6 :)
>>>
>>>
>>>
>>>
>>>
>>>
>>> #2 is driven a lot by #1 as depending on the parser implementation there
>>> may be different object types returned. JSON is a bit tricky as "valid
>>> JSON" can mean null, a scalar, an object, or an array. Most people thing of
>>> it as just an object but "foobar" is valid JSON as well. This leads us to
>>> #3...
>>>
>>>
>>> The object type to return has to be IMHO JsonObject:
>>> http://docs.oracle.com/javaee/7/api/javax/json/JsonObject.html
>>>
>>>
>>>
>>> Not always though. All these are valid JSON too:
>>>
>>>
>>>
>>> => SELECT '1'::json AS num, '"test"'::json AS string, '[1,2,3]'::json AS
>>> arr, '{"foo":"bar"}'::json AS obj;
>>>
>>> num | string | arr | obj
>>>
>>> -----+--------+---------+---------------
>>>
>>> 1 | "test" | [1,2,3] | {"foo":"bar"}
>>>
>>> (1 row)
>>>
>>>
>>>
>>> We'll need separate getters/setters for the scalar and array types as
>>> well. I agree that most people will just be using the object type though
>>> (and maybe the array).
>>>
>>>
>>> You are right here. Please s/JsonObject/JsonValue/g JsonValue is a
>>> container for any of the above including objects and arrays. So it would be
>>> enough just with JsonValue getJsonValue(....)
>>>
>>>
>>>
>>>
>>>
>>> #3 doesn't have a straight answer as there is no getJSON(...) methods in
>>> the JDBC spec. It'd probably have to be returned via getObject(...).
>>>
>>> An alternative is to provide PGResultSet and PGPreparedStatement classes
>>> similar to PGConnection that provides PG extensions. They could have the
>>> get/set methods (ex: getJsonScalar(...) or setJsonObject(Map<String,Object>
>>> ...)) to retrieve JSON values as specific object types (i.e. scalar,
>>> object, array). It'd be a bit more type safe as presumably most people
>>> using json/jsonb types know the top level type of what they're storing.
>>>
>>>
>>> Probably adding methods to PG classes would be better than getObject
>>> and force explicit casts. Regarding the methods, if they simply return
>>> JsonObject, you already have a full API there to parse and extract and
>>> process. So anything that returns a JsonObject from a column (identifier or
>>> #) would be enough for me.
>>>
>>>
>>>
>>> For most cases I think it'd be fine. I think the custom mapping I
>>> mentioned above would cover the rest. Anything beyond that would be a full
>>> on transformation and would be very application specific.
>>>
>>>
>>> Yepp
>>>
>>>
>>>
>>>
>>>
>>> For #4 I see two possible wins. First off on the usability side, there's
>>> some convenience to natively interfacing with json/jsonb types. It'll only
>>> have value though if those types are the same ones that users are using in
>>> the rest of their code. If they're just using them as Map<String,Object>
>>> everywhere then it'd still be a pain for a user to convert to our "native"
>>> PG JSON types to use via JDBC. Having a dedicated API that allows for
>>> interaction using native Java types would make this more convenient.
>>>
>>> The other win I can see for #4 is on performance. Right now JSON is
>>> converted to a String. That means everybody using it has to convert it
>>> twice. First raw bytes to String, then String to object. A dedicated API
>>> could cut one of those out of the way. Given how the wire protocol is
>>> implemented in the driver, it wouldn't be a direct reading from the input
>>> stream (it'll be buffered in a byte array), but at least it won't be copied
>>> twice.
>>>
>>>
>>> As far as I know, most users are using JsonObject, so returning that
>>> is a perfect match for pgjdbc. I don't expect however big performance wins
>>> as JSON is sent as a String over the wire...
>>>
>>>
>>>
>>> The performance gain isn't on the wire, it's from not having to convert
>>> bytes => String => JsonObject. It'd be bytes => JsonObject or bytes =>
>>> CustomObject. Less work and less GC. The bigger the JSON string, the bigger
>>> the savings too.
>>>
>>>
>>> You are right in that JSR353 allows you to create a parser directly
>>> out of an InputStream, so you would avoid converting to String. That's a
>>> win. The rest of the conversions are inevitable (having the latter one you
>>> pointed out laying in user's realm, beyond driver's responsibility).
>>>
>>> Regards,
>>>
>>> Álvaro
>>>
>>>
>>>
>>> --
>>>
>>> Álvaro Hernández Tortosa
>>>
>>>
>>>
>>>
>>>
>>> -----------
>>>
>>> 8Kdata
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

In response to

  • Re: SQLJSON at 2015-06-29 21:19:17 from Christopher BROWN

Responses

  • Re: SQLJSON at 2015-06-29 21:32:18 from Steven Schlansker
  • Re: SQLJSON at 2015-06-29 22:01:02 from Christopher BROWN

Browse pgsql-jdbc by date

  From Date Subject
Next Message Steven Schlansker 2015-06-29 21:24:37 Re: SQLJSON
Previous Message Christopher BROWN 2015-06-29 21:19:17 Re: SQLJSON