Re: moving data between tables causes the db to overwhelm the system

From: Robert Schnabel <schnabelr(at)missouri(dot)edu>
To: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
Cc: Kevin Kempter <kevink(at)consistentstate(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: moving data between tables causes the db to overwhelm the system
Date: 2009-09-01 12:33:10
Message-ID: 4A9D1486.9010603@missouri.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


> If you want to partition your huge data set by "time", and the data
> isn't already ordered by "time" on disk, you could do this :
>
> SET work_mem TO something very large like 10GB since you got 32GB RAM,
> check your shared buffers etc first;
> CREATE TABLE tmp AS SELECT * FROM bigTable ORDER BY "time"; <- huge
> sort, will take some time
>
> SET maintenance_work_mem TO something very large;
> CREATE INDEX tmp_time ON tmp( "time" );
>
> CREATE TABLE partition1 AS SELECT * FROM tmp WHERE "time" BETWEEN
> beginning AND end;
> (repeat...)
>
> Since tmp is clustered on "time" you'll get a nice fast bitmap-scan,
> and you won't need to seq-scan N times (or randomly index-scan) bigTable.
>
I went through the same exercise a couple months ago with a table that
had ~1.7 billion rows. I used a similar approach to what's described
above but in my case I didn't create the tmp table and did the ORDER BY
when I did each select on the bigTable to do the insert (I didn't have
many of them). My data was structured such that this was easier than
doing the huge sort. In any event, it worked great and my smaller
partitions are much much faster.

Bob

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Wei Yan 2009-09-01 22:01:41 Help: how to speed up query after db server reboot
Previous Message Pierre Frédéric Caillaud 2009-09-01 11:09:05 Re: moving data between tables causes the db to overwhelm the system