Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

From: Stefan Keller <sfkeller(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>, oleg(at)sai(dot)msu(dot)su, Ott Michel <m1ott(at)hsr(dot)ch>
Subject: Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)
Date: 2011-05-22 13:21:45
Message-ID: BANLkTinteuW83HcVDnoUK4p_0jOOX6G3XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Tom, hi all

Thanks, Tom, for your tipps. You answered 2011/5/1:
> (...), and there's no point in having the
> index column contents be the entire tags value (which is what's leading
> to the failure). Consider
>
> create index planet_osm_point_amenity on planet_osm_point ((tags->amenity));

To get a more general purpose index I tried also:

CREATE INDEX planet_osm_point_tags ON planet_osm_point USING gist(tags);
-- ERROR: invalid hstore value found
-- SQL state: XX000

And I'm really interested in being able to use GIST. GIST is also
recommended here:
http://www.bostongis.com/PrinterFriendly.aspx?content_name=loading_osm_postgis

But I still get error 'invalid hstore value found' on my machine -
whatever I do!

I tried hard to find out the reason and also to remedy the cause of
this failure.
For example I did an update like this: UPDATE planet_osm_point SET
tags = hstore(hstore_to_array(tags));
...with no success.

I'm running "PostgreSQL 9.1alpha1, compiled by Visual C++ build 1500,
32-bit" on Windows XP SP3.
I have a dump of the table/database at hand to anyone who is
interested in this possible bug.

Yours, Stefan

2011/5/1 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Stefan Keller <sfkeller(at)gmail(dot)com> writes:
>> I'm doing an equality search with success with the '->' operator on
>> the same field 'tags' like in this query (1):
>
>> -- Count all restaurants in database ("amenity = restaurant''):
>> select count(*) from osm_all_v
>> where hstore(tags)->'amenity'='restaurant'
>
>> This query 1 is reasonably fast and could be accelerated using this
>> functional index:
>> CREATE INDEX planet_osm_point_tags_restaurant
>>   ON planet_osm_point
>>   USING btree (tags)
>>   WHERE (tags -> 'amenity'::text) = 'restaurant'::text;
>
> This index seems a bit carelessly defined.  There's no need to confine
> its usefulness to exactly that query, and there's no point in having the
> index column contents be the entire tags value (which is what's leading
> to the failure).  Consider
>
> create index planet_osm_point_amenity on planet_osm_point ((tags->amenity));
>
> which will work for the above query and any other that's looking for a
> specific value of tags->amenity.
>
>> SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*)
>> FROM planet_osm_point p, (
>>   SELECT key, count(*)
>>   FROM (
>>     SELECT (each(tags)).key FROM planet_osm_point
>>   ) AS stat
>> GROUP BY key HAVING count(*) > 1
>>   AND key NOT LIKE 'note:%'
>>   ...
>>   AND key NOT IN
>> ('ele','ref','url','website','email','maxspeed','converted_by', ... )
>> ) keys
>> WHERE hstore(p.tags)->keys.key >''
>> GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1
>> ORDER by 1,3 desc
>
> It's pretty much useless to think about indexes for queries like this.
> If it's going to scan the whole table anyway, as this surely is, then
> an index is not going to make it faster.
>
>                        regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Johnston 2011-05-22 13:37:15 Re: Syntax Error for "boolean('value')" Type Casting
Previous Message jun yang 2011-05-22 12:49:31 Re: how to start a procedure after postgresql started.