Re: table partioning performance

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>
Cc: "Colin Taylor" <colin(dot)taylor(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: table partioning performance
Date: 2007-01-11 12:15:50
Message-ID: 1168517751.3951.538.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, 2007-01-10 at 16:00 -0500, Steven Flatt wrote:
> On 1/9/07, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> If you are doing date range partitioning it should be fairly
> simple to
> load data into the latest table directly. That was the way I
> originally
> intended for it to be used. The rules approach isn't something
> I'd
> recommend as a bulk loading option and its a lot more complex
> anyway.
> The problem we have with blindly loading all data into the latest
> table is that some data (< 5%, possibly even much less) is actually
> delivered "late" and belongs in earlier partitions. So we still
> needed the ability to send data to an arbitrary partition.

Yes, understand the problem.

COPY is always going to be faster than INSERTs anyhow and COPY doesn't
allow views, nor utilise rules. You can set up a client-side program to
pre-qualify the data and feed it to multiple simultaneous COPY commands,
as the best current way to handle this.

--
Next section aimed at pgsql-hackers, relates directly to above:

My longer term solution looks like this:

1. load all data into newly created partition (optimised in a newly
submitted patch for 8.3), then add the table as a new partition

2. use a newly created, permanent "errortable" into which rows that
don't match constraints or have other formatting problems would be put.
Following the COPY you would then run an INSERT SELECT to load the
remaining rows from the errortable into their appropriate tables. The
INSERT statement could target the parent table, so that rules to
distribute the rows would be applied appropriately. When all of those
have happened, drop the errortable. This would allow the database to
apply its constraints accurately without aborting the load when a
constraint error occurs.

In the use case you outline this would provide a fast path for 95% of
the data load, plus a straightforward mechanism for the remaining 5%.

We discussed this on hackers earlier, though we had difficulty with
handling unique constraint errors, so the idea was shelved. The
errortable part of the concept was sound however.
http://archives.postgresql.org/pgsql-hackers/2005-11/msg01100.php
James William Pye had a similar proposal
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00120.php

The current TODO says
"Allow COPY to report error lines and continue
This requires the use of a savepoint before each COPY line is processed,
with ROLLBACK on COPY failure."

If we agreed that the TODO actually has two parts to it, each of which
is separately implementable:
1. load errors to a table (all errors apart from uniqueness violation)
2. do something sensible with unique violation ERRORs

IMHO part (1) can be implemented without Savepoints, which testing has
shown (see James' results) would not be an acceptable solution for bulk
data loading. So (1) can be implemented fairly easily, whereas (2)
remains an issue that we have no acceptable solution for, as yet.

Can we agree to splitting the TODO into two parts? That way we stand a
chance of getting at least some functionality in this important area.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-01-11 12:15:56 Re: [COMMITTERS] pgsql: Stamp major release 8.3.0,
Previous Message Simon Riggs 2007-01-11 11:32:12 Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

Browse pgsql-performance by date

  From Date Subject
Next Message Bernd Helmle 2007-01-11 12:51:12 Re: Partitioning
Previous Message Richard Huxton 2007-01-11 11:27:20 Re: Does it matters the column order in indexes and constraints