Re: Data Warehouse Reevaluation - MySQL vs Postgres --

From: "Iain" <iain(at)mst(dot)co(dot)jp>
To: "Joe Conway" <mail(at)joeconway(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Date: 2004-09-16 05:08:34
Message-ID: 00b301c49bab$38275070$7201a8c0@mst1x5r347kymb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Joe,

You went to quite a bit of effort, thanks I have the picture now.

Using inheritence seems to be a useful refinement on top of the earlier
outlined aproach using the UNION ALL view with appropriate predicates on the
condition used to do the partitioning. Having the individual partitions
derived from a parent table makes a lot of sense.

regards
Iain

----- Original Message -----
From: "Joe Conway" <mail(at)joeconway(dot)com>
To: "Iain" <iain(at)mst(dot)co(dot)jp>
Cc: <pgsql-performance(at)postgresql(dot)org>
Sent: Thursday, September 16, 2004 1:07 PM
Subject: Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

> Iain wrote:
> >>That's exactly what we're doing, but using inherited tables instead of a
> >>union view. With inheritance, there is no need to rebuild the view each
> >>time a table is added or removed. Basically, in our application, tables
> >>are partitioned by either month or week, depending on the type of data
> >>involved, and queries are normally date qualified.
> >
> > That sounds interesting. I have to admit that I havn't touched
iheritance in
> > pg at all yet so I find it hard to imagine how this would work. If you
have
> > a chance, would you mind elaborating on it just a little?
>
> OK, see below:
> =====================
>
> create table foo(f1 int, f2 date, f3 float8);
>
> create table foo_2004_01() inherits (foo);
> create table foo_2004_02() inherits (foo);
> create table foo_2004_03() inherits (foo);
>
> create index foo_2004_01_idx1 on foo_2004_01(f2);
> create index foo_2004_02_idx1 on foo_2004_02(f2);
> create index foo_2004_03_idx1 on foo_2004_03(f2);
>
> insert into foo_2004_02 values(1,'2004-feb-15',3.14);
>
>
> -- needed just for illustration since these are toy tables
> set enable_seqscan to false;
> explain analyze select * from foo where f2 = '2004-feb-15';
>
> QUERY PLAN
> --------------------------------------------------------------------------
----------------------------------------------------------------------
> Result (cost=100000000.00..100000061.32 rows=16 width=16) (actual
> time=0.224..0.310 rows=1 loops=1)
> -> Append (cost=100000000.00..100000061.32 rows=16 width=16)
> (actual time=0.214..0.294 rows=1 loops=1)
> -> Seq Scan on foo (cost=100000000.00..100000022.50 rows=5
> width=16) (actual time=0.004..0.004 rows=0 loops=1)
> Filter: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_01_idx1 on foo_2004_01 foo
> (cost=0.00..17.07 rows=5 width=16) (actual time=0.101..0.101 rows=0
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_02_idx1 on foo_2004_02 foo
> (cost=0.00..4.68 rows=1 width=16) (actual time=0.095..0.101 rows=1
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_03_idx1 on foo_2004_03 foo
> (cost=0.00..17.07 rows=5 width=16) (actual time=0.066..0.066 rows=0
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> Total runtime: 0.582 ms
> (11 rows)
>
> create table foo_2004_04() inherits (foo);
> create index foo_2004_04_idx1 on foo_2004_04(f2);
>
> explain analyze select * from foo where f2 = '2004-feb-15';
>
> QUERY PLAN
> --------------------------------------------------------------------------
----------------------------------------------------------------------
> Result (cost=100000000.00..100000078.38 rows=21 width=16) (actual
> time=0.052..0.176 rows=1 loops=1)
> -> Append (cost=100000000.00..100000078.38 rows=21 width=16)
> (actual time=0.041..0.159 rows=1 loops=1)
> -> Seq Scan on foo (cost=100000000.00..100000022.50 rows=5
> width=16) (actual time=0.004..0.004 rows=0 loops=1)
> Filter: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_01_idx1 on foo_2004_01 foo
> (cost=0.00..17.07 rows=5 width=16) (actual time=0.012..0.012 rows=0
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_02_idx1 on foo_2004_02 foo
> (cost=0.00..4.68 rows=1 width=16) (actual time=0.016..0.022 rows=1
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_03_idx1 on foo_2004_03 foo
> (cost=0.00..17.07 rows=5 width=16) (actual time=0.008..0.008 rows=0
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> -> Index Scan using foo_2004_04_idx1 on foo_2004_04 foo
> (cost=0.00..17.07 rows=5 width=16) (actual time=0.095..0.095 rows=0
loops=1)
> Index Cond: (f2 = '2004-02-15'::date)
> Total runtime: 0.443 ms
> (13 rows)
>
> For loading data, we COPY into foo, and have a trigger that redirects
> the rows to the appropriate partition.
>
> Notice that the partitions which do not contain any data of interest are
> still probed for data, but since they have none it is very quick. In a
> real life example I got the following results just this afternoon:
>
> - aggregate row count = 471,849,665
> - total number inherited tables = 216
> (many are future dated and therefore contain no data)
> - select one month's worth of data for one piece of equipment by serial
> number (49,257 rows) = 526.015 ms
>
> Not too bad -- quick enough for my needs. BTW, this is using NFS mounted
> storage (NetApp NAS).
>
> Joe

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Joe Conway 2004-09-16 05:17:45 Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Previous Message Christopher Kings-Lynne 2004-09-16 04:29:04 Re: Data Warehouse Reevaluation - MySQL vs Postgres --