Re: about google summer of code 2016

From: Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <obartunov(at)gmail(dot)com>, Mehboob Alam <hello(at)thinkx(dot)com>
Subject: Re: about google summer of code 2016
Date: 2016-03-23 00:19:18
Message-ID: 56F1E106.6060900@8kdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/02/16 23:23, Álvaro Hernández Tortosa wrote:
>
>
> On 22/02/16 05:10, Tom Lane wrote:
>> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
>>> On 19/02/16 10:10, Ãlvaro Hernández Tortosa wrote:
>>>> Oleg and I discussed recently that a really good addition to a GSoC
>>>> item would be to study whether it's convenient to have a binary
>>>> serialization format for jsonb over the wire.
>>> Seems a bit risky for a GSoC project. We don't know if a different
>>> serialization format will be a win, or whether we want to do it in the
>>> end, until the benchmarking is done. It's also not clear what we're
>>> trying to achieve with the serialization format: smaller on-the-wire
>>> size, faster serialization in the server, faster parsing in the client,
>>> or what?
>> Another variable is that your answers might depend on what format you
>> assume the client is trying to convert from/to. (It's presumably not
>> text JSON, but then what is it?)
>
> As I mentioned before, there are many well-known JSON
> serialization formats, like:
>
> - http://ubjson.org/
> - http://cbor.io/
> - http://msgpack.org/
> - BSON (ok, let's skip that one hehehe)
> - http://wiki.fasterxml.com/SmileFormatSpec
>
>>
>> Having said that, I'm not sure that risk is a blocking factor here.
>> History says that a large fraction of our GSoC projects don't result
>> in a commit to core PG. As long as we're clear that "success" in this
>> project isn't measured by getting a feature committed, it doesn't seem
>> riskier than any other one. Maybe it's even less risky, because there's
>> less of the success condition that's not under the GSoC student's
>> control.
>

I wanted to bring an update here. It looks like someone did the
expected benchmark "for us" :)

https://eng.uber.com/trip-data-squeeze/ (thanks Alam for the link)

While this is Uber's own test, I think the conclusions are quite
significant: an encoding like message pack + zlib requires only 14% of
the size and encodes+decodes in 76% of the time of JSON. There are of
course other contenders that trade better encoding times over slightly
slower decoding and bigger size. But there are very interesting numbers
on this benchmark. MessagePack, CBOR and UJSON (all + zlib) look like
really good options.

So now that we have this data I would like to ask these questions
to the community:

- Is this enough, or do we need to perform our own, different benchmarks?

- If this is enough, and given that we weren't elected for GSoC, is
there interest in the community to work on this nonetheless?

- Regarding GSoC: it looks to me that we failed to submit in time. Is
this what happened, or we weren't selected? If the former (and no
criticism here, just realizing a fact) what can we do next year to avoid
this happening again? Is anyone "appointed" to take care of it?

Álvaro

--
Álvaro Hernández Tortosa

-----------
8Kdata

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-03-23 00:48:07 Re: Updated backup APIs for non-exclusive backups
Previous Message Robert Haas 2016-03-23 00:05:19 Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)