| From: | Tomas Vondra <tv(at)fuzzy(dot)cz> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: jsonb and nested hstore | 
| Date: | 2014-03-15 14:01:39 | 
| Message-ID: | 53245D43.8030708@fuzzy.cz | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 15.3.2014 06:40, Peter Geoghegan wrote:
> On Fri, Mar 14, 2014 at 6:44 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> Well, depends on how you define useful. With the sample dataset
>> 'delicious' (see Peter's post) I can do this:
>>
>>    SELECT doc FROM delicious
>>           WHERE doc @> '{"title_detail" : {"value" : "TheaterMania"}}';
>>
>> with arbitrary paths, and I may create a GIN index to support such 
>> queries. And yes, it's much faster than GiST for example (by a
>> factor of 1000).
> 
> If you know ahead of time the entire nested value you can. So, if
> you attach some other data to the "TheaterMania" document, you had
> better know that too if you hope to write a query like this. You also
> have to index the entire table, where presumably with a little
> thought you could get away with a much smaller index. That strikes me
> as not very useful.
Sure, I need to know some basic rules / do assumptions about the
structure of the json document. In other words, schemaless databases are
difficult to query.
For example when storing mail message headers (i.e. the example I've
used before), I do know that the json document is rather well structured
- it's not nested at all, and all the values are either scalar values
(mostly strings), or arrays of scalars.
So it looks like this
  {
    "from" : "john(dot)doe(at)example(dot)com",
    "to" : ["jane(dot)doe(at)example(dot)com", "jack(dot)doe(at)example(dot)com"],
    ...
  }
So the schema is rather well defined (not the exact keys, but the
structure certainly is).
Let's say I want to allow arbitrary searches on headers - I can't
support that with expression indexes, because there's like a zillion of
possible headers and I'd have to create an expression index on each of
them separately.
But I can support that with a single GIN index ...
>> Yes, the GIN index is quite large (~560MB for a ~1.2GB table).
> 
> With the default opclass, without an expressional index, 100% of the
> data from the table appears in the index. Why do you think that's
> quite large?
That wasn't meant as a complaint. I have no problem with the index size
(If we can make it smaller in the future, great! But I can live with the
current index sizes too.)
regards
Tomas
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2014-03-15 15:59:01 | Re: Archive recovery won't be completed on some situation. | 
| Previous Message | Mohsen SM | 2014-03-15 13:32:44 | bpchar functinos |