Re: best practice for moving millions of rows to child table when setting up partitioning?

From: Bob Lunney <bob_lunney(at)yahoo(dot)com>
To: pgsql-admin(at)postgresql(dot)org, Mark Stosberg <mark(at)summersault(dot)com>
Subject: Re: best practice for moving millions of rows to child table when setting up partitioning?
Date: 2011-04-27 19:37:48
Message-ID: 659395.83416.qm@web39708.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Mark,

Comments inline below for items 2 and 3. What you are planning should work like a charm.

Bob Lunney

--- On Wed, 4/27/11, Mark Stosberg <mark(at)summersault(dot)com> wrote:

> From: Mark Stosberg <mark(at)summersault(dot)com>
> Subject: [ADMIN] best practice for moving millions of rows to child table when setting up partitioning?
> To: pgsql-admin(at)postgresql(dot)org
> Date: Wednesday, April 27, 2011, 10:48 AM
>
> Hello,
>
> I'm working on moving a table with over 30 million to rows
> to be
> partitioned. The table seeing several inserts per second.
> It's
> essentially an activity log that only sees insert activity
> and is
> lightly used for reporting, such that queries against it
> can safely be
> disabled during a transition.
>
> I'm looking for recommendations for a way to do this that
> will be least
> disruptive to the flow of inserts statements that will
> continue to
> stream in.
>
> Here's the plan which seems best to me at the moment. Is it
> is
> reasonable?
>
> 1. Handling creating the empty/future partitions is easy. I
> have the
> code for this done already, and will make several
> partitions in advance
> of needing them.
>
> 2. To create the partitions that should have data moved
> from the parent,
> I'm thinking of creating them, and then before they are
> "live",
> using INSERT ... SELECT to fill them with data from the
> parent table.
> I'll run the INSERT first, and then add their indexes.
>

Use "create table as select ..." (CTAS) instead of creating the table, then inserting. Since the table is created and populated atomically there is no need to log the inserts in WAL, and the operation is much faster.

> 3. I will then install the trigger to redirect the inserts
> to the child
> table.
>

If possible, its better to have the code simply do inserts directly into the child table - after all, if the partitioning is based on date, both the code and database know the date, so the code knows to which child table it should write at any given moment.

> 4. There will still be a relatively small number of new
> rows from the
> parent table to be deal with that came in after the INSERT
> from #2 was
> started, so a final INSERT .. SELECT statement will be made
> to copy the
> remaining rows.
>
> 5. Finally, I'll drop the indexes on the parent table and
> truncate it.
>
> Thanks for advice here. If there's a tutorial out there
> about this that
> I've missed, I'm happy to review it instead having it
> rehashed here.
>
> Thanks for the help!
>
>     Mark
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Fred Parkinson 2011-04-27 19:44:51 Re: Starting psql without a database?
Previous Message Mark Stosberg 2011-04-27 19:35:14 Re: best practice for moving millions of rows to child table when setting up partitioning?