Re: best practice for moving millions of rows to child table when setting up partitioning?

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: best practice for moving millions of rows to child table when setting up partitioning?
Date: 2011-04-27 20:17:16
Message-ID: ip9tkc$jek$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 04/27/2011 10:48 AM, Mark Stosberg wrote:
>
> Hello,
>
> I'm working on moving a table with over 30 million to rows to be
> partitioned. The table seeing several inserts per second. It's
> essentially an activity log that only sees insert activity and is
> lightly used for reporting, such that queries against it can safely be
> disabled during a transition.
>
> I'm looking for recommendations for a way to do this that will be least
> disruptive to the flow of inserts statements that will continue to
> stream in.
>
> Here's the plan which seems best to me at the moment. Is it is
> reasonable?

I revised my plan based on feedback and mentioned resources here, and
also due to my own concerns about the resource and performance impact of
using the INSERT .. SELECT pattern on millions of rows at a time.

Here's my revised plan for the transition, which avoids using large
INSERT..SELECT statements, and only requires a minimal amount of the
transition to work to happen once the application has started to insert
data into the child tables. Seem reasonable?

New Plan
---------

1. Create the child tables targeted to contain data from the parent as
standalone tables (including archive tables and the current month).

2. Take a full database backup from this morning and extract the COPY
statement for the parent table. Manually split it up by date to create
multiple copy statements, one for each partition.

3. Run the COPY statements to load the data into each child table. So
far, everything has happened outside of application access.

4. Find the max ID that has been inserted in the current child table.

5. INSERT .. SELECT the missing rows from the last backup from the parent
table to the current child table to be. Again, note the max ID.

6. Now, during a maintenance window:
- alter the child tables to inherit the parent
- Set up the trigger which starts redirecting inserts
from the parent table to the child table.
- INSERT .. SELECT the file the final few missing rows from
the parent to the current child
- TRUNCATE then CLUSTER the parent table.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message ramon cruel 2011-04-27 20:29:10 Please, i want exit here
Previous Message Tom Lane 2011-04-27 20:15:49 Re: Starting psql without a database?