Quick Links

Partitioning with a lot of number of partitions

From:	Aleksej Trofimov <aleksej(dot)trofimov(at)ruptela(dot)lt>
To:	pgsql-novice(at)postgresql(dot)org
Subject:	Partitioning with a lot of number of partitions
Date:	2011-10-12 14:09:14
Message-ID:	4E959F8A.30009@ruptela.lt
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-novice

Hello, I'm novice in using postgres, so please be patient =)

By the moment we are using PostgreSQL 9.1.0. Our database is about 270GB
and currently we have a problem with our "hot" table which is about
120GB of data and 20GB index. By "hot" I mean, that we have a lot of
insert/update/select operations on this table (about 30/30/120 per
second). This hot table is called coordinates. In our logic we have
several definitions: Coordinate - basic packet of data; Object - basic
unit, which sends coordinates; Client - client which own object. One
client may has a lot of object. and each object may has a lot of
coordinates. All coordinates are stored in coordinates table.
So by the moment we have an issue with slow insert/select queries...
this why we decided to partition this table. We have tested 2 types of
partitioning: By date (weekly), by object;
This are our results:

SELECT QUERY (1 object):

-----------------------------------------------------
Partition type | M | W | D |
-----------------------------------------------------
Non-partitioned | 5830 | 460 | 2913 |
Date(weekly) | 1000 | 440 | 106 |
Object | 0.02 | 0.03 | 0.009 |
------------------------------------------------------

M - query for month period
W - query for week period
D - query for day period
All times in ms.

So according to our results, we have much greater performance using
partitioning by Object, BUT we have a lot of object (about 5000, and
planning to connect 2-5 times bigger object count next year). This plans
leads to the firs question:

1) How postgres would "like" such a big count of partitions? Will it
decrease performance in future? By the moment we noticed several
performance issues with the big count of object id in such conditional
statement: SELECT * FROM coordinates_object.coordinates WHERE object_id
in (HERE ARE SEVERAL OBJECTS);
EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM coordinates_object.coordinates WHERE
datetime between '2011-07-01' and '2011-07-02' AND object_id in
(722::bigint, 728::bigint, 727::bigint, 248::bigint);

Returns:

"Result (cost=0.00..1887.23 rows=1455 width=655) (actual
time=49.266..695.483 rows=908 loops=1)"
" -> Append (cost=0.00..1887.23 rows=1455 width=655) (actual
time=49.264..694.995 rows=908 loops=1)"
" -> Seq Scan on coordinates (cost=0.00..0.00 rows=1 width=314)
(actual time=0.001..0.001 rows=0 loops=1)"
" Filter: ((datetime >= '2011-07-01 00:00:00'::timestamp
without time zone) AND (datetime <= '2011-07-02 00:00:00'::timestamp
without time zone) AND (object_id = ANY ('{722,728,727,248}'::bigint[])))"
" -> Bitmap Heap Scan on coordinates_722 coordinates
(cost=10.90..451.49 rows=258 width=654) (actual time=49.261..195.291
rows=166 loops=1)"
" Recheck Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
" -> Bitmap Index Scan on coordinates_722_datetime_index
(cost=0.00..10.83 rows=258 width=0) (actual time=49.237..49.237 rows=166
loops=1)"
" Index Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" -> Bitmap Heap Scan on coordinates_728 coordinates
(cost=13.00..568.65 rows=463 width=656) (actual time=108.833..203.139
rows=331 loops=1)"
" Recheck Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
" -> Bitmap Index Scan on coordinates_728_datetime_index
(cost=0.00..12.88 rows=463 width=0) (actual time=61.914..61.914 rows=332
loops=1)"
" Index Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" -> Bitmap Heap Scan on coordinates_727 coordinates
(cost=19.56..824.53 rows=713 width=651) (actual time=147.295..251.406
rows=370 loops=1)"
" Recheck Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
" -> Bitmap Index Scan on coordinates_727_datetime_index
(cost=0.00..19.38 rows=713 width=0) (actual time=125.261..125.261
rows=370 loops=1)"
" Index Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" -> Index Scan using coordinates_248_datetime_index on
coordinates_248 coordinates (cost=0.00..42.56 rows=20 width=777)
(actual time=0.017..44.991 rows=41 loops=1)"
" Index Cond: ((datetime >= '2011-07-01
00:00:00'::timestamp without time zone) AND (datetime <= '2011-07-02
00:00:00'::timestamp without time zone))"
" Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))"
"Total runtime: 695.963 ms"

2) As you see from EXPLAIN, postgres is using array for filtering
Filter: (object_id = ANY ('{722,728,727,248}'::bigint[]))", why?

3) Last question would be about some "unpredictable" behavior on such query:

EXPLAIN ANALYZE SELECT * FROM coordinates_object.coordinates WHERE
datetime between '2011-07-01' and '2011-07-02' AND object_id in (SELECT
id FROM object WHERE client_id=318);

Using such query, postgres is trying to Bitmap Heap Scan on every index
of all partitions (Remember we have 5000 partitions). Such a query makes
our server think a lot =).

So if it possible please judge our decision and offer some other
solutions...

Responses

Re: Partitioning with a lot of number of partitions at 2011-10-18 02:42:09 from Josh Kupershmidt

Browse pgsql-novice by date

	From	Date	Subject
Next Message	Dournaee, Blake	2011-10-12 22:24:43	question on cross-data center replication
Previous Message	jjurban	2011-10-11 22:31:34	ERROE:"Catalog is missing 28 elements from RELID mmmmmm."