Re: nested hstore patch

From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: nested hstore patch
Date: 2013-12-20 23:16:30
Message-ID: AEC7F7D9-4B3C-4074-AF3C-5260600560E2@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Nov 12, 2013, at 10:35 AM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> Hi!
>
> Attatched patch adds nesting feature, types (string, boll and numeric values), arrays and scalar to hstore type.

My apologies for not getting to this sooner, work has been a bit nutty. The truth is that I reviewed this patch quite a bit a month back, mostly so I could write documentation, the results of which are included in this patch. And I'm super excited for what's to come in the next iteration, as I hear that Teodor and Andrew are hard at work adding jsonb as a binary-compatible JSON data type.

Meanwhile, for this version, a quick overview of what has changed since 9.2.

Contents & Purpose
==================

Improved Data Type Support
--------------------------

* Added data type support for values. Previously they could only be strings or NULL, but with this patch they can also be numbers or booleans.

* Added array support. Values can be arrays of other values. The format for arrays is a bracketed, comma-delimited list.

* Added nesting support. hstore values can themselves be hstores. Nested hstores are wrapped in braces, but the root-level hstore is not (for compatibility with the format of previous versions of hstore).

* An hstore value is no longer required to be an hstore object. It can now be any scalar value.

These three items make the basic format feature-complete with JSON. Here's an example where the values are scalars:

=% SELECT 'foo'::hstore, '"hi \"bob\""'::hstore, '1.0'::hstore, 'true'::hstore, NULL::hstore;
hstore | hstore | hstore | hstore | hstore
--------+--------------+--------+--------+--------
"foo" | "hi \"bob\"" | 1.0 | t |

And here are a couple of arrays with strings, numbers, booleans, and NULLs:

SELECT '[k,v]'::hstore, '[1.0, "hi there", false, null]'::hstore;
hstore | hstore
------------+----------------------------
["k", "v"] | [1.0, "hi there", f, NULL]

Here's a complicated example formatted with `hstore.pretty_print` enabled.

=% SET hstore.pretty_print=true;
=% SELECT '{
"type" => "Feature",
"bbox" => [-180.0, -90.0, 180.0, 90.0],
"geometry" => {
"type" => "Polygon",
"coordinates" => [[
[-180.0, 10.0], [20.0, 90.0], [180.0, -5.0], [-30.0, -90.0]
]]
}
}'::hstore;
hstore
--------------------------
"bbox"=> +
[ +
-180.0, +
-90.0, +
180.0, +
90.0 +
], +
"type"=>"Feature", +
"geometry"=> +
{ +
"type"=>"Polygon", +
"coordinates"=> +
[ +
[ +
[ +
-180.0, +
10.0 +
], +
[ +
20.0, +
90.0 +
], +
[ +
180.0, +
-5.0 +
], +
[ +
-30.0, +
-90.0 +
] +
] +
] +
}

So, exact feature parity with the JSON data type.

* hstore.pretty_print is a new GUC, specifically to allow an HSTORE value to be pretty-printed. There is also a function to pretty-print, so we might be able to just do away with the GUC.

Interface
---------

* New operators:
+ `hstore -> int`: Get string value at array index (starting at 0)
+ `hstore ^> text`: Get numeric value for key
+ `hstore ^> int`: Get numeric value at array index
+ `hstore ?> text`: Get boolean value for key
+ `hstore ?> int`: Get boolean value at array index
+ `hstore #> text[]`: Get string value for key path
+ `hstore #^> text[]`: Get numeric value for key path
+ `hstore #?> text[]`: Get boolean value for key path
+ `hstore %> text`: Get hstore value for key
+ `hstore %> int`: Get hstore value at array index
+ `hstore #%> text[]`: Get hstore value for key path
+ `hstore ? int`: Does hstore contain array index
+ `hstore #? text[]`: Does hstore contain key path
+ `hstore - int`: Delete index from left operand
+ `hstore #- text[]`: Delete key path from left operand

* New functions:
+ `hstore(text)`: Make a text scalar hstore
+ `hstore(numeric)`: Make a numeric scalar hstore
+ `hstore(boolean)`: Make a boolean scalar hstore
+ `hstore(text, hstore)`: Make a nested hstore
+ `hstore(text, numeric)`: Make an hstore with a key and numeric value
+ `hstore(text, boolean)`: Make an hstore with a key and boolean value
+ `array_to_hstore(anyarray): Make an array hstore from an SQL array
+ `hvals(hstore)` Get values as a set of hstore values
+ `json_to_hstore(json)` Convert JSON to hstore
+ `each_hstore(hstore)` Get set of hstore key/value pairs
+ `hstore_typeof(hstore)` Return text name for the hstore type (hash, array, text, numeric, etc.)
+ `replace(hstore,text[],hstore)`: Replace value at specified path
+ `concat_path(hstore,text[],hstore)`: Concatenate hstore value at specified path
+ `hstore_print(hstore, params)`: Format hstore as text

The hstore_print() function has a number of optional boolean parameters to affect how the resulting text is formatted. They all default to false:

- pretty_print
- array_curly_braces: use {} instead of [] for arrays
- root_hash_decorated: Use {} for the root hash
- json: Format as JSON
- loose: Try to parse numbers and booleans from text values

Other Changes
-------------

* New casts: JSON and HSTORE can be cast to each other. I don't think they're implicit, though the forthcoming jsonb data type might support explicit casting to and from hstore, since internally they will be identical.

* The internal representation has been changed, but should be backward (and pg_upgrade) compatible, just as Andrew Gierth's change from 8.4 to 9.0 was. One can do an in-place update to rewrite all records at once. Of course, nested and/or non-hash hstore values dumped from 9.4 will not be able to be loaded into 9.3.

* GIN indexing is now supported. This is actually pretty amazing. For an hstore value, even hash keys are considered values, as far as the index is concerned. This makes it efficient to find hstore values that contain a key. I wrote an example in this blog post:

http://theory.so/pg/2013/10/25/indexing-nested-hstore/

Submission review
=================
* Is the patch in a patch format which has context? Yes.
* Does it apply cleanly to the current git master? It did for me, though I think Peter has found an issue or two since.
* Does it include reasonable tests, necessary doc patches, etc? Yes.

Usability review
================

* Does the patch actually implement what it says it does? Yes.
* Do we want that? OH yes.
* Do we already have it? No.
* Does it follow SQL spec, or the community-agreed behavior? Yes, though want jsonb, too.
* Does it include pg_dump support? Yes
* Are there dangers? Could break backward compatibility, though I don't think it does.
* Have all the bases been covered? I think so

Feature test
============

* Does the feature work as advertised? Yes.
* Are there corner cases the author has failed to consider? All I noticed were promptly fixed.
* Are there any assertion failures or crashes? No

Performance review
==================

* Does the patch slow down simple tests? No
* If it claims to improve performance, does it? Yes, with GIN index support. Loading hstore values is slower than loading JSON, but everything else is faster than JSON.
* Does it slow down other things? No.

Coding review
=============

* Does it follow the project guidelines? Yes.
* Are there portability issues? Unknown
* Will it work on Windows/BSD etc? Tested on OS X only.
* Are the comments sufficient and accurate? Yes.
* Does it do what it says, correctly? As best I can tell, yes.
* Does it produce compiler warnings? No.
* Can you make it crash? No, but I did find a bug or two that was promptly fixed.

Architecture review
===================

* Is everything done in a way that fits together coherently with other features/modules? yes.
* Are there interdependencies that can cause problems? No

Conclusion
==========

I love where nested hstore is going, especially since it will be used for jsonb, too. The nesting, data type, and GIN index support is really great, and the new constructors provide a nice SQL API that make it easy to use. I think that the next version of this patch will be full of win for the project.

This was considered a WIP patch, since the jsonb support is still forthcoming, so it's appropriate to leave it marked “Returned with feedback”. As Andrew is doing much of that work, the code itself will get a much closer examination from him. But for the hstore feature itself, I think the current interface and features are ready to go.

Best,

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-12-20 23:27:44 Re: nested hstore patch
Previous Message Gavin Flower 2013-12-20 23:09:03 Re: make_interval ??