Re: COPY command details

From: Tiger Quimpo <bopolissimus(dot)lists(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: COPY command details
Date: 2007-03-30 06:11:39
Message-ID: 1175235100.969.52.camel@tiger-work.ramcarnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, 2007-03-29 at 22:15 -0700, Benjamin Arai wrote:
> I have one system which I have used partitioning. For this particular
> case I have tons of data over about (50 years). What I did is wrote
> small loader that breaks data in tables based on date, so I have tables
> like abc_2000, abc_2001 etc. The loading script is only a couple
> hundred lines of code. The only part that was a little bit of work was
> to allow for easy access to the data for the data for the devs. I did
> this by writing a few PL functions to automatically union the tables
> and produce results. So the function like getData(startData,enddate)
> would run a union query for the respective date ranges.

>From reading on the list, I think the current recommendations
are:

have a parent table and then create the partitions as
descendants (INHERITS) of the parent table.

Use constraints to ensure that nothing gets inserted
into the parents, and that only the correct data gets
inserted into the descendants (i.e., the constraints
enforce that 2002 data won't insert into the 2001 partition,
the insert would fail if you tried that).

Turn constraint_exclusion on in postgresql.conf (helps
the planner ignore partitions that don't qualify), so
it doesn't need to scan partitions where there will be no
matches anyway.

Use rules on the parent to redirect inserts/updates/deletes
to the right partition (or trigger, i'm having some problems
with both rules and triggers, so I may have the loader
insert straight to the right partition instead, although
I'm still trying to get the rules right.

The advantage of this (parent table plus descendant partitions)
is that you can query from the parent, and the descendants will
be automatically queried, you'll get all matching rows from any
descendants that have them. No need for those unions or
pl/pgsql functions to do the unions for you.

That said, I need to make my current code work with smaller
test data sets so I can isolate the problems I'm having with
rules, etc. E.g., my original plan was to detect automatically
(in a before trigger) if a partition for the to-be-loaded data
set already exists, and if it doesn't, create the partitions
and the rules. I then load the data into the parent table and
expect the newly created rules to redirect the data into the
new partitions. Either my rules are wrogn, or there's something
about transaction semantics that's making that fail since the
rows are inserting into the parent table (no constraint there
yet to disallow inserts into the parent) instead of the right
partition.

To The List:
Are schema changes (such as adding rules and creating child
partitions) part of the same transaction or do they happen
(magically) outside the current transaction, in a new
transaction (e.g., so that the inserts would fail because
the inserts are running in an older transaction that can't
see the new rule or new partition table yet).

As I said, I might just be doing something wrogn there.
I've set that project aside for now since it's not
urgent (the current unpartitioned system works well enough,
I just avoid reindex, vacuum, vacuum full and pg_dump), but
I'll start working on it again when I get some free time in
a week or two (Holy Week, holiday where I am :-).

tiger

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2007-03-30 07:30:29 Re: coalesce for null AND empty strings
Previous Message Tiger Quimpo 2007-03-30 05:58:01 Re: COPY command details