Обсуждение: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

Поиск
Список
Период
Сортировка

Values larger than 1/3 of a buffer page cannot be indexed (hstore)

От
Stefan Keller
Дата:
Hi,

2011/3/13 Viktor Nagy <viktor.nagy@toolpart.hu>
> when trying to insert a long-long value, I get the following error:
>
> ERROR: Index row size 3120 exceeds maximum 2712 for index "ir_translation_ltns"
> HINT:  Values larger than 1/3 of a buffer page cannot be indexed.
> Consider a function index of an MD5 hash of the value, or use full text indexing.

I get the same error but I'm using a hstore attribute (called 'tags').

Unfortunately, the trick with the MD5 function index does not work neither:

CREATE INDEX planet_osm_point_tags
  ON planet_osm_point ((md5(tags)));
ERROR: Funktion md5(hstore) does not exist
SQL state: 42883

Any ideas on how to index my hstore attribute?

Yours, S.

Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

От
Tom Lane
Дата:
Stefan Keller <sfkeller@gmail.com> writes:
> Any ideas on how to index my hstore attribute?

Use a GIST or GIN index.  The only thing that a btree index on hstore
can do for you is to support equality comparisons on the whole hstore
value, which is pretty unlikely to be what you're after.

            regards, tom lane

Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

От
Stefan Keller
Дата:
Hi Tom,

Thanks for the hint!  I'm actually doing a GROUP BY with an inquality
search (HAVING...) and still get long lasting queries, see query (2)
below.

I'm doing an equality search with success with the '->' operator on
the same field 'tags' like in this query (1):

-- Count all restaurants in database ("amenity = restaurant''):
select count(*) from osm_all_v
where hstore(tags)->'amenity'='restaurant'

This query 1 is reasonably fast and could be accelerated using this
functional index:
CREATE INDEX planet_osm_point_tags_restaurant
  ON planet_osm_point
  USING btree (tags)
  WHERE (tags -> 'amenity'::text) = 'restaurant'::text;


But look at query 2:

This GROUP BY query gives following results (after a VACUUM FULL ANALYZE):
* 26265ms without any index on field 'tags'
* 26000ms with GIN index on field 'tags'
* 26078ms with HASH index on field 'tags'

To sum up:
* GIN and HASH indexes don't make a difference.
* BTree index fails (as reported) on field 'tags' with ERROR 'index
row size 3120 exceeds maximum 2712' SQL state: 54000).
* GIST index fails too on field 'tags' but with ERROR 'invalid hstore
value found', SQL state: XX000.

Q:
=> The latter indicates an "invalid value" - but does not say what's invalid.
=> I have no clue how to boost query 2...? See below the query and the
explain output.

Yours, Stefan


QUERY 2
=======
-- Return all key-values of semantic type 'enum' without types
numeric, date/time etc.
-- (actually hstore handles all tag/values as of type text):
SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*)
FROM planet_osm_point p, (
  SELECT key, count(*)
  FROM (
    SELECT (each(tags)).key FROM planet_osm_point
  ) AS stat
GROUP BY key HAVING count(*) > 1
  AND key NOT LIKE 'note:%'
  ...
  AND key NOT IN
('ele','ref','url','website','email','maxspeed','converted_by', ... )
) keys
WHERE hstore(p.tags)->keys.key >''
GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1
ORDER by 1,3 desc
...
"amenity";"bus_stop";24414
"amenity";"restaurant";5423
"amenity";"bench";5041
"amenity";"parking";4232
"amenity";"fire_hydrant";2363
"amenity";"post_box";1838
"amenity";"fuel";1628
"amenity";"place_of_worship";1615
...

EXPLAIN output:
"Sort  (cost=6043108.58..6054613.93 rows=4602142 width=118) (actual
time=331415.987..331417.914 rows=964 loops=1)"
"  Sort Key: (min(keys.key)), (count(*))"
"  Sort Method:  quicksort  Memory: 85kB"
"  ->  GroupAggregate  (cost=4173505.61..4401220.42 rows=4602142
width=118) (actual time=330101.581..331409.834 rows=964 loops=1)"
"        Filter: (count(*) > 1)"
"        ->  Sort  (cost=4173505.61..4198338.94 rows=9933329
width=118) (actual time=330101.487..330870.237 rows=196227 loops=1)"
"              Sort Key: ((p.tags -> keys.key))"
"              Sort Method:  external merge  Disk: 21960kB"
"              ->  Nested Loop  (cost=22534.58..574521.77 rows=9933329
width=118) (actual time=6747.314..327192.811 rows=196227 loops=1)"
"                    Join Filter: ((p.tags -> keys.key) > ''::text)"
"                    ->  Seq Scan on planet_osm_point p
(cost=0.00..5649.22 rows=182822 width=86) (actual time=0.015..516.191
rows=182822 loops=1)"
"                    ->  Materialize  (cost=22534.58..22539.88
rows=163 width=32) (actual time=0.039..0.890 rows=420 loops=182822)"
"                          ->  Subquery Scan on keys
(cost=22534.58..22539.07 rows=163 width=32) (actual
time=6740.620..6743.358 rows=420 loops=1)"
"                                ->  HashAggregate
(cost=22534.58..22537.44 rows=163 width=32) (actual
time=6740.613..6741.651 rows=420 loops=1)"
"                                      Filter: (count(*) > 1)"
"                                      ->  Subquery Scan on stat
(cost=0.00..21417.62 rows=148929 width=32) (actual
time=0.093..6230.640 rows=196458 loops=1)"
"                                            Filter: ((stat.key !~~
'name%'::text) AND (stat.key !~~ 'addr:%'::text) AND (stat.key !~~*
'FIXME'::text) AND (stat.key !~~* 'openGeoDB:%'::text) AND (stat.key
!~~ 'note:%'::text) AND (stat.key !~~ '%_ref'::text) AND (stat.key !~~
'%description%'::text) AND (stat.key !~~ 'contact:%'::text) AND
(stat.key !~~ 'operator:%'::text) AND (stat.key !~~ 'uic_%'::text) AND
(stat.key !~~ 'TMC%'::text) AND (stat.key !~~ 'uic_%'::text) AND
(stat.key !~~ 'direction%'::text) AND (stat.key !~~ 'is_in%'::text)
AND (stat.key !~~ 'wikipedia%'::text) AND (stat.key <> ALL

('{ele,ref,url,website,email,maxspeed,converted_by,layer,level,phone,mobility:station_id,information,opening_hours,date,time,collection_times,operator,colour,fee,nat_name,alt_name,population,seats,postal_code,capacity,line,lines,maxheight,reg_name}'::text[])))"
"                                            ->  Seq Scan on
planet_osm_point  (cost=0.00..6106.28 rows=182822 width=86) (actual
time=0.025..2363.603 rows=575700 loops=1)"
"Total runtime: 331426.018 ms"




2011/5/1 Tom Lane <tgl@sss.pgh.pa.us>:
> Stefan Keller <sfkeller@gmail.com> writes:
>> Any ideas on how to index my hstore attribute?
>
> Use a GIST or GIN index.  The only thing that a btree index on hstore
> can do for you is to support equality comparisons on the whole hstore
> value, which is pretty unlikely to be what you're after.
>
>                        regards, tom lane
>

Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

От
Tom Lane
Дата:
Stefan Keller <sfkeller@gmail.com> writes:
> I'm doing an equality search with success with the '->' operator on
> the same field 'tags' like in this query (1):

> -- Count all restaurants in database ("amenity = restaurant''):
> select count(*) from osm_all_v
> where hstore(tags)->'amenity'='restaurant'

> This query 1 is reasonably fast and could be accelerated using this
> functional index:
> CREATE INDEX planet_osm_point_tags_restaurant
>   ON planet_osm_point
>   USING btree (tags)
>   WHERE (tags -> 'amenity'::text) = 'restaurant'::text;

This index seems a bit carelessly defined.  There's no need to confine
its usefulness to exactly that query, and there's no point in having the
index column contents be the entire tags value (which is what's leading
to the failure).  Consider

create index planet_osm_point_amenity on planet_osm_point ((tags->amenity));

which will work for the above query and any other that's looking for a
specific value of tags->amenity.

> SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*)
> FROM planet_osm_point p, (
>   SELECT key, count(*)
>   FROM (
>     SELECT (each(tags)).key FROM planet_osm_point
>   ) AS stat
> GROUP BY key HAVING count(*) > 1
>   AND key NOT LIKE 'note:%'
>   ...
>   AND key NOT IN
> ('ele','ref','url','website','email','maxspeed','converted_by', ... )
> ) keys
> WHERE hstore(p.tags)->keys.key >''
> GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1
> ORDER by 1,3 desc

It's pretty much useless to think about indexes for queries like this.
If it's going to scan the whole table anyway, as this surely is, then
an index is not going to make it faster.

            regards, tom lane

Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

От
Stefan Keller
Дата:
Hi Tom, hi all

Thanks, Tom, for your tipps. You answered 2011/5/1:
> (...), and there's no point in having the
> index column contents be the entire tags value (which is what's leading
> to the failure).  Consider
>
> create index planet_osm_point_amenity on planet_osm_point ((tags->amenity));

To get a more general purpose index I tried also:

CREATE INDEX planet_osm_point_tags ON planet_osm_point USING gist(tags);
-- ERROR: invalid hstore value found
-- SQL state: XX000

And I'm really interested in being able to use GIST. GIST is also
recommended here:
http://www.bostongis.com/PrinterFriendly.aspx?content_name=loading_osm_postgis

But I still get error 'invalid hstore value found' on my machine -
whatever I do!

I tried hard to find out the reason and also to remedy the cause of
this failure.
For example I did an update like this: UPDATE planet_osm_point SET
tags = hstore(hstore_to_array(tags));
...with no success.

I'm running "PostgreSQL 9.1alpha1, compiled by Visual C++ build 1500,
32-bit" on Windows XP SP3.
I have a dump of the table/database at hand to anyone who is
interested in this possible bug.

Yours, Stefan

2011/5/1 Tom Lane <tgl@sss.pgh.pa.us>:
> Stefan Keller <sfkeller@gmail.com> writes:
>> I'm doing an equality search with success with the '->' operator on
>> the same field 'tags' like in this query (1):
>
>> -- Count all restaurants in database ("amenity = restaurant''):
>> select count(*) from osm_all_v
>> where hstore(tags)->'amenity'='restaurant'
>
>> This query 1 is reasonably fast and could be accelerated using this
>> functional index:
>> CREATE INDEX planet_osm_point_tags_restaurant
>>   ON planet_osm_point
>>   USING btree (tags)
>>   WHERE (tags -> 'amenity'::text) = 'restaurant'::text;
>
> This index seems a bit carelessly defined.  There's no need to confine
> its usefulness to exactly that query, and there's no point in having the
> index column contents be the entire tags value (which is what's leading
> to the failure).  Consider
>
> create index planet_osm_point_amenity on planet_osm_point ((tags->amenity));
>
> which will work for the above query and any other that's looking for a
> specific value of tags->amenity.
>
>> SELECT MIN(keys.key), hstore(p.tags)->keys.key, count(*)
>> FROM planet_osm_point p, (
>>   SELECT key, count(*)
>>   FROM (
>>     SELECT (each(tags)).key FROM planet_osm_point
>>   ) AS stat
>> GROUP BY key HAVING count(*) > 1
>>   AND key NOT LIKE 'note:%'
>>   ...
>>   AND key NOT IN
>> ('ele','ref','url','website','email','maxspeed','converted_by', ... )
>> ) keys
>> WHERE hstore(p.tags)->keys.key >''
>> GROUP BY hstore(p.tags)->keys.key HAVING count(*) > 1
>> ORDER by 1,3 desc
>
> It's pretty much useless to think about indexes for queries like this.
> If it's going to scan the whole table anyway, as this surely is, then
> an index is not going to make it faster.
>
>                        regards, tom lane
>

Re: Values larger than 1/3 of a buffer page cannot be indexed (hstore)

От
Tom Lane
Дата:
Stefan Keller <sfkeller@gmail.com> writes:
> Hi Tom, hi all
> Thanks, Tom, for your tipps. You answered 2011/5/1:
>> (...), and there's no point in having the
>> index column contents be the entire tags value (which is what's leading
>> to the failure).  Consider
>>
>> create index planet_osm_point_amenity on planet_osm_point ((tags->amenity));

> To get a more general purpose index I tried also:

> CREATE INDEX planet_osm_point_tags ON planet_osm_point USING gist(tags);
> -- ERROR: invalid hstore value found
> -- SQL state: XX000

> And I'm really interested in being able to use GIST. GIST is also
> recommended here:
> http://www.bostongis.com/PrinterFriendly.aspx?content_name=loading_osm_postgis

> But I still get error 'invalid hstore value found' on my machine -
> whatever I do!

Do you have a self-contained test case for this?  The symptom seems
similar to a bug we found back around 9.0 release --- maybe there's
another instance of the same mistake someplace.

            regards, tom lane