horizontal partition

From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: horizontal partition
Date: 2005-02-02 00:16:17
Message-ID: ctp64h$l4d$1@floppy.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,
I have a big table with ~ 10 Milion rows, and is a very
pain administer it, so after years I convinced my self
to partition it and replace the table usage ( only for reading )
with a view.

Now my user_logs table is splitted in 4:

user_logs
user_logs_2002
user_logs_2003
user_logs_2004

and the view v_user_logs is builded on top of these tables:

CREATE OR REPLACE VIEW v_user_logs AS
SELECT * FROM user_logs
UNION ALL
SELECT * FROM user_logs_2002
UNION ALL
SELECT * FROM user_logs_2003
UNION ALL
SELECT * FROM user_logs_2004
;

the view is performing really well:

empdb=# explain analyze select * from v_user_logs where id_user = sp_id_user('kalman');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan v_user_logs (cost=0.00..895.45 rows=645 width=88) (actual time=17.039..2345.388 rows=175 loops=1)
-> Append (cost=0.00..892.23 rows=645 width=67) (actual time=17.030..2344.195 rows=175 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..120.70 rows=60 width=67) (actual time=17.028..17.036 rows=1 loops=1)
-> Index Scan using idx_user_user_logs on user_logs (cost=0.00..120.40 rows=60 width=67) (actual time=17.012..17.018 rows=1 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 2" (cost=0.00..475.44 rows=316 width=67) (actual time=49.406..1220.400 rows=79 loops=1)
-> Index Scan using idx_user_user_logs_2004 on user_logs_2004 (cost=0.00..473.86 rows=316 width=67) (actual time=49.388..1219.386 rows=79 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 3" (cost=0.00..204.33 rows=188 width=67) (actual time=59.375..1068.806 rows=95 loops=1)
-> Index Scan using idx_user_user_logs_2003 on user_logs_2003 (cost=0.00..203.39 rows=188 width=67) (actual time=59.356..1067.934 rows=95 loops=1)
Index Cond: (id_user = 4185)
-> Subquery Scan "*SELECT* 4" (cost=0.00..91.75 rows=81 width=67) (actual time=37.623..37.623 rows=0 loops=1)
-> Index Scan using idx_user_user_logs_2002 on user_logs_2002 (cost=0.00..91.35 rows=81 width=67) (actual time=37.618..37.618 rows=0 loops=1)
Index Cond: (id_user = 4185)
Total runtime: 2345.917 ms
(15 rows)

the problem is now if this view is used in others views like this:

CREATE OR REPLACE VIEW v_ua_user_login_logout_tmp AS
SELECT
u.login,
ul.*
FROM user_login u,
v_user_logs ul
WHERE
u.id_user = ul.id_user
;

empdb=# explain analyze select * from v_ua_user_login_logout_tmp where login = 'kalman';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.01..228669.81 rows=173 width=100) (actual time=1544.784..116490.363 rows=175 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Subquery Scan ul (cost=0.00..193326.71 rows=7067647 width=88) (actual time=5.677..108190.096 rows=7067831 loops=1)
-> Append (cost=0.00..157988.47 rows=7067647 width=67) (actual time=5.669..77109.995 rows=7067831 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..8158.48 rows=362548 width=67) (actual time=5.666..3379.178 rows=362862 loops=1)
-> Seq Scan on user_logs (cost=0.00..6345.74 rows=362548 width=67) (actual time=5.645..1395.673 rows=362862 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..93663.88 rows=4191588 width=67) (actual time=9.149..35094.798 rows=4191580 loops=1)
-> Seq Scan on user_logs_2004 (cost=0.00..72705.94 rows=4191588 width=67) (actual time=9.117..16531.486 rows=4191580 loops=1)
-> Subquery Scan "*SELECT* 3" (cost=0.00..44875.33 rows=2008233 width=67) (actual time=0.562..24017.680 rows=2008190 loops=1)
-> Seq Scan on user_logs_2003 (cost=0.00..34834.17 rows=2008233 width=67) (actual time=0.542..13224.265 rows=2008190 loops=1)
-> Subquery Scan "*SELECT* 4" (cost=0.00..11290.78 rows=505278 width=67) (actual time=7.100..3636.163 rows=505199 loops=1)
-> Seq Scan on user_logs_2002 (cost=0.00..8764.39 rows=505278 width=67) (actual time=6.446..1474.709 rows=505199 loops=1)
-> Hash (cost=4.00..4.00 rows=1 width=16) (actual time=0.083..0.083 rows=0 loops=1)
-> Index Scan using user_login_login_key on user_login u (cost=0.00..4.00 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1)
Index Cond: ((login)::text = 'kalman'::text)
Total runtime: 116491.056 ms
(16 rows)

as you can see the index scan is not used anymore.
Do you see any problem on this approach ?

Regards
Gaetano Mendola

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message William Yu 2005-02-02 04:25:09 Re: High end server and storage for a PostgreSQL OLTP system
Previous Message Cosimo Streppone 2005-02-01 21:11:30 Re: High end server and storage for a PostgreSQL OLTP system