Skip site navigation (1) Skip section navigation (2)

Partitioning with a lot of number of partitions

From: Aleksej Trofimov <aleksej(dot)trofimov(at)ruptela(dot)lt>
To: pgsql-novice(at)postgresql(dot)org
Subject: Partitioning with a lot of number of partitions
Date: 2011-10-12 14:09:14
Message-ID: 4E959F8A.30009@ruptela.lt (view raw or flat)
Thread:
Lists: pgsql-novice
Hello, I'm novice in using postgres, so please be patient =)

By the moment we are using PostgreSQL 9.1.0. Our database is about 270GB 
and currently we have a problem with our "hot" table which is about 
120GB of data and 20GB index. By "hot" I mean, that we have a lot of 
insert/update/select operations on this table (about 30/30/120 per 
second). This hot table is called coordinates. In our logic we have 
several definitions: Coordinate - basic packet of data; Object - basic 
unit, which sends coordinates; Client - client which own object.  One 
client may has a lot of object. and each object may has a lot  of 
coordinates. All coordinates are stored in coordinates table.
So by the moment we have an issue with slow insert/select queries... 
this why we decided to partition this table. We have tested 2 types of 
partitioning: By date (weekly), by object;
This are our  results:

SELECT QUERY (1 object):

-----------------------------------------------------
Partition type         |  M     |  W    |    D    |
-----------------------------------------------------
Non-partitioned     | 5830 | 460  | 2913  |
Date(weekly)         | 1000 | 440  | 106    |
Object                    | 0.02 | 0.03 |  0.009 |
------------------------------------------------------

M - query for month period
W - query for week period
D - query for day period
All times in ms.

So according to our results, we have much greater performance using 
partitioning by Object, BUT we have a lot of object (about 5000, and 
planning to connect 2-5 times bigger object count next year). This plans 
leads to the firs question:

1) How postgres would "like" such a big count of partitions? Will it 
decrease performance in future? By the moment we noticed several 
performance issues with the big count of object id in such conditional 
statement: SELECT * FROM coordinates_object.coordinates WHERE object_id 
in (HERE ARE SEVERAL OBJECTS);
EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM coordinates_object.coordinates WHERE 
datetime between '2011-07-01' and '2011-07-02' AND object_id in 
(722::bigint, 728::bigint, 727::bigint, 248::bigint);

Returns:

"Result  (cost=0.00..1887.23 rows=1455 width=655) (actual 
time=49.266..695.483 rows=908 loops=1)"
"  ->  Append  (cost=0.00..1887.23 rows=1455 width=655) (actual 
time=49.264..694.995 rows=908 loops=1)"
"        ->  Seq Scan on coordinates  (cost=0.00..0.00 rows=1 width=314) 
(actual time=0.001..0.001 rows=0 loops=1)"
"              Filter: ((datetime >= '2011-07-01 00:00:00'::timestamp 
without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp 
without time zone) AND (object_id = ANY ('{722,728,727,248}'::bigint[])))"
"        ->  Bitmap Heap Scan on coordinates_722 coordinates  
(cost=10.90..451.49 rows=258 width=654) (actual time=49.261..195.291 
rows=166 loops=1)"
"              Recheck Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"              Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
"              ->  Bitmap Index Scan on coordinates_722_datetime_index  
(cost=0.00..10.83 rows=258 width=0) (actual time=49.237..49.237 rows=166 
loops=1)"
"                    Index Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"        ->  Bitmap Heap Scan on coordinates_728 coordinates  
(cost=13.00..568.65 rows=463 width=656) (actual time=108.833..203.139 
rows=331 loops=1)"
"              Recheck Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"              Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
"              ->  Bitmap Index Scan on coordinates_728_datetime_index  
(cost=0.00..12.88 rows=463 width=0) (actual time=61.914..61.914 rows=332 
loops=1)"
"                    Index Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"        ->  Bitmap Heap Scan on coordinates_727 coordinates  
(cost=19.56..824.53 rows=713 width=651) (actual time=147.295..251.406 
rows=370 loops=1)"
"              Recheck Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"              Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
"              ->  Bitmap Index Scan on coordinates_727_datetime_index  
(cost=0.00..19.38 rows=713 width=0) (actual time=125.261..125.261 
rows=370 loops=1)"
"                    Index Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"        ->  Index Scan using coordinates_248_datetime_index on 
coordinates_248 coordinates  (cost=0.00..42.56 rows=20 width=777) 
(actual time=0.017..44.991 rows=41 loops=1)"
"              Index Cond: ((datetime >= '2011-07-01 
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02 
00:00:00'::timestamp without time zone))"
"              Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
"Total runtime: 695.963 ms"

2) As you see from EXPLAIN, postgres is using array for filtering 
Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))", why?

3) Last question would be about some "unpredictable" behavior on such query:

EXPLAIN ANALYZE SELECT * FROM coordinates_object.coordinates WHERE 
datetime between '2011-07-01' and '2011-07-02' AND object_id in (SELECT 
id FROM object WHERE client_id=318);

Using such query, postgres is trying to Bitmap Heap Scan on every index 
of all partitions (Remember we have 5000 partitions). Such a query makes 
our server think a lot =).

So if it possible please judge our decision and offer some other 
solutions...





Responses

pgsql-novice by date

Next:From: Dournaee, BlakeDate: 2011-10-12 22:24:43
Subject: question on cross-data center replication
Previous:From: jjurbanDate: 2011-10-11 22:31:34
Subject: ERROE:"Catalog is missing 28 elements from RELID mmmmmm."

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group