Re: Re: best practice for moving millions of rows to child table when setting up partitioning?

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Re: best practice for moving millions of rows to child table when setting up partitioning?
Date: 2011-05-02 04:01:59
Message-ID: 4DBE2CB7.7020208@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 04/27/2011 03:35 PM, Mark Stosberg wrote:
> In particular, I wanted to check whether the UPDATE statement would
> alter all the rows automatically, or if the underlying trigger would
> cause all the rows processed a row at a time.
>
> It appears from my test that the result of the UPDATE was going to
> appear all at once. I'm worried about the resource implications of
> inserting mullions of rows all at once.
>

You can put a LIMIT on an UPDATE, same as any other type of query. No
reason that style of migration must happen all at once, you can just
target a smaller number of rows at a time and slowly siphon rows over to
the children by iteration.

I don't see anything inherently wrong with the approach you're
proposing. CREATE INDEX CONCURRENTLY does something similar to the
logic you've outlined--mark where data was inserted at, move over all
older data, then just copy over the new rows at the end. The main
downsides of that is complexity and the need for downtime to get an
atomic swap to using the child tables. You can't add them to the parent
until the original is gone, if you've inserted duplicate data into them.

I'm not sure what all that complexity buys you, compared to just adding
all the children, putting a limit on the UPDATE, and looping over that
with some delay after each iteration until it's finished if you want to
further control the rate. There's usually no reason you have to be in a
rush to moving data over. Relying on the database's transactional
scheme to avoid making any mistakes here--making it so a goof will
ROLLBACK--and avoiding any need for downtime are normally higher
priorities in a partition migration than making the move happen as fast
as possible.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Greg Smith 2011-05-02 04:09:36 Re: archive_timeout behavior (8.4.6)
Previous Message Tom Lane 2011-05-01 16:17:10 Re: Option shared_buffers in PostgreSQL