Re: hstore - Implementation and performance issues around its operators

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Stefan Keller <sfkeller(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: hstore - Implementation and performance issues around its operators
Date: 2011-06-21 20:50:50
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> 1. Obviously the '@>' has to be used in order to let use the GiST index.
>> Why is the '->' operator not supported by GiST ('->' is actually
>> mentioned in all examples of the doc.)?

There's no way to make the planner recognize something like
(col->'foo' = 'bar')
as being an indexable condition on "col". (If 'foo' is constant then you
can of course create a functional (btree) index on (col->'foo').)

Whereas (col @> hstore('foo','bar')) can use a GiST or GIN index directly
as long as 'foo' and 'bar' are pseudoconstants (i.e. constants, columns of
other tables, or stable functions of those).

>> 2. Currently the hstore elements are stored in order as they are
>> coming from the insert statement / constructor.

No, the elements of an hstore are stored in order of (keylength,key)
with the key comparison done bytewise (not locale-dependent).

>> Why are the elements not ordered i.e. why is the hstore not cached in
>> all hstore functions (like hstore_fetchval etc.)?

I don't understand what you think would be cached.

>> 3. In the source code 'hstore_io.c' one finds the following enigmatic
>> note: "... very large hstore values can't be output. this could be
>> fixed, but many other data types probably have the same issue."
>> What is the max. length of a hstore (i.e. the max. length of the sum
>> of all elements in text representation)?

An hstore is internally limited to the 1GB limit that applies to all
varlenas and individual memory allocations. However, the output text
representation can be longer than the internal one, and the output is
_also_ limited to 1GB.

The most obvious example of another type with the same issue is bytea,
where a value >512MB will fail to output in hex mode, and a value >256MB
might fail to output in escape mode depending on the content.

>> 4. Last, I don't fully understand the following note in the hstore
>> doc. (
>> ):
>> Notice that the old names are reversed from the convention
>> formerly followed by the core geometric data types!

>> Why names? Why not rather 'operators' or 'functions'?

It's referring to the name of the operator (e.g. "@" or "~").

>> What does this "reversed from the convention" mean concretely?

In old postgres versions the operators @ and ~ were used by various types
to mean "contains" and "contained by", but many contrib types had them
reversed in meaning from the core geometry types.

For example, if a and b are boxes, (a @ b) means "a is contained in b",
but if they are cubes from contrib/cube, or hstores, then (a @ b) means
"a contains b".

The inconsistency here was obviously intolerable and was resolved by
introducing <@ and @> as the recommended names for "contained in" and
"contains" operators respectively, with the < > giving a visual hint as
to which was the "larger" side.


Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2011-06-21 21:20:01 Re: Online base backup from the hot-standby
Previous Message Merlin Moncure 2011-06-21 19:55:51 Re: Libpq enhancement