Re: Partitioning option for COPY

From: Emmanuel Cecchet <manu(at)asterdata(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Emmanuel Cecchet <manu(at)asterdata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioning option for COPY
Date: 2009-11-15 23:41:15
Message-ID: 4B00919B.4050309@asterdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jan,

Here is a new version of the patch. Find the response to your comments
embedded in the text.
>> partitioning option for COPY
>>
>
> Here's the review:
>
> == Submission ==
> The patch is contextual, applies cleanly to current HEAD, compiles fine.
> The docs build cleanly.
>
> == Docs ==
> They're reasonably clear, although they still mention ERROR_LOGGING,
> which was taken out of this patch. They could use some wordsmithing, but
> I didn't go into details, as there were more severe issues with the patch.
>
Removed the text related to ERROR_LOGGING.
> One thing that made me cautious was the mention that triggers modifying
> tuples will make random errors appear. As is demonstrated later,
> triggers are a big issue.
>
Whichever way routing is implemented we will have to decide what we want
to do with triggers. We can decide to fire them or not (there was
already a debate whether COPY is an insert statement or not and should
fire the statement trigger for insert). This is not a design problem
with this patch, we just have to chose what we want to do with triggers
when partitioning is involved. IMHO we should disable them altogether
but there are scenarios where one could argue that there are still useful.
> == Regression tests ==
> They ran fine, there's one additional regression test that exercises the
> new option.
>
> == Style/nitpicks ==
> Minor gripes include:
> o instead of using an ad-hoc data structure for the LRU cache list, I'd
> suggest an OidList from pg_list.h.
>
Will do if we decide to go further with this patch.
> o some mentions of "method" in comments should be changed to "function"
> o trailing whitespace in the patch (it's not the end of the world, of
> course)
>
I guess the committer will run pg_indent anyway so I'm not too worried
about spaces.
> == Issues ==
> Attached are 3 files that demonstrate problems the patch has.
> o test1.sql always segfaults for me, poking around with gdb suggests
> it's a case of an uninitialised cache list (another reason to use the
> builtin one).
>
I was never able to reproduce that problem. I don't know where this
comes from.
> o test2.sql demonstrates, that indices on child tables are not being
> updated, probably because after resultRelInfo in
> check_tuple_constraints() gets created is never has ri_NumIndices set,
> and so the code that was supposed to take care of indices is never
> called. Looks like a copy-paste error.
>
Fixed, actually there was a leak in relcache for the index.
> o test3.sql demonstrates, that some triggers that I would expect to be
> fired are in fact not fired. I guess it's the same reason as mentioned:
> ri_TrigDesc never gets set, so the code that calls triggers is dead.
>
There is a problem with after row triggers that I did not completely
figure out. For some reason, if I use the regular mechanism by calling
ExecARInsertTrigger that differ the execution of the trigger until the
after row event is triggered, the child relation is not closed and there
is a leak in the relcache. I forced the after row triggers to execute
synchronously after inserting in the child table to work around the
problem. If someone has an explanation, I am willing to do a cleaner
implementation!
> I stopped there, because unfortunately, apart from all that there's one
> fundamental problem with this patch, namely "we probably don't want it".
>
> As it stands it's more of a proof of concept than a really usable
> solution, it feels like built from spare (copied from around copy.c)
> parts. IMHO it's much too narrow for a general partitioning solution,
> even if the design it's based upon would be accepted. It's assuming a
> lot of things about the presence of child tables (with proper
> constraints), the absence of triggers, and so on.
>
> Granted, it solves a particular problem (bulk loading into a partitioned
> table, with not extra features like triggers and with standard
> inheritance/exclusive check constraints setup), but that's not good
> enough in my opinion, even if all other issues would be addressed.
>
Well, as Postgres does not have any support for real partitioning
besides inheritance, and so far it is unlikely that another
implementation will happen in the 8.5 timeframe, this feature fills the
need for people doing data warehouses. This is a scenario used with
every single Aster customer. Now if the Postgres community does not
think that the Aster use case is general enough or of interest to be
integrated in the code base, this is a different issue and I won't spent
time arguing if this is a philosophical/political issue.
Note that the new patch works with triggers but you can easily generate
corrupt data if your triggers are modifying the data on which the
routing decision is based.
> Now I'm not a real Postgres user, it's been a while since I worked in a
> PG shop (or a DB shop for that matter), but from what I understand from
> following this community for a while, a patch like that doesn't have a
> lot of chances to be committed. That said, my puny experience with real
> PG installations and their needs must be taken into account here.
>
I don't really understand why a new option of COPY should be solving a
general problem. It's an option, and like every option, it is to solve a
particular use case. I don't see what is wrong with that.
> I'll mark this patch as "Waiting on Author", but I have little doubt
> that even after fixing those probably trivial segfaults etc. the patch
> would be promptly rejected by a committer. I suggest withdrawing it from
> this commitfest and trying to work out a more complete design first that
> would address the needs of a bigger variety of users, or joining some of
> the already underway efforts to bring full-featured partitioning into
> Postgres.
>
I have integrated your tests in the regression test suite and I was
never able to reproduce the segfault you mentioned. What platform are
you using?

Thanks for your valuable feedback
Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com

Attachment Content-Type Size
aster-copy-partitioning-patch-8.5-contextv2.txt text/plain 49.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Roger Leigh 2009-11-15 23:47:18 Re: Unicode UTF-8 table formatting for psql text output
Previous Message Tom Lane 2009-11-15 23:38:18 Re: Summary and Plan for Hot Standby