Re: What happened to the is_<type> family of functions proposal?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: What happened to the is_<type> family of functions proposal?
Date: 2010-09-26 03:03:35
Message-ID: AANLkTinChAnFEQP_yQ1eFyCpM53XfsVFUBP=4sbM_ODP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 25, 2010 at 10:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This is all pretty much a dead end, because it offers no confidence
> whatsoever.  Suppose that COPY calls type X's input function, which
> calls function Y, which calls function Z.  Z doesn't like what it sees
> so it throws an error, which it marks "recoverable" since Z hasn't
> done anything dangerous.  Unfortunately, Y *did* do something that
> requires cleanup.  If COPY catches the longjmp and decides that it
> can skip doing a transaction abort, you're screwed.

Yep. Although it seems a bit pathological for Z to do that, because
if Y is doing something like taking an LWLock then Z is some low-level
internals function that is not in a good position to judge whether
error recovery is feasible. The point is not to recover from as many
errors as possible, but to recover specifically from *data validation*
errors, which I would not expect to be the sort of thing thrown from
someplace deep down in the call stack where we're deep in the middle
of things. The toplevel typinput is a pretty good position to know
whether it's done anything shady.

> What I'm wondering is whether we can fix this by reducing the overhead
> of subtransactions, enough so that we can afford to run each row's input
> function calls within a subxact.  In the past that was dismissed because
> you'd run out of subxact XIDs at 4G rows.  But we have "lazy" assignment
> of XIDs now, so a subxact that didn't actually try to modify the
> database shouldn't need to consume any permanent resources.  Then we're
> just looking at the time needed to call all the per-module subxact start
> and subxact cleanup functions, which seems like something that might be
> optimizable for the typical case where nothing actually needs to be
> done.

Well, reducing the overhead of subtransaction cleanup would certainly
be VERY nice, as it would benefit a FAR broader set of use cases than
just typinput functions. It seems a bit tricky though, because
AbortSubTransaction() calls a whole LOT of cleanup functions, and many
of them already have fast-paths. Where do you anticipate getting a
further large speed-up out of that? The problem seems particularly
tricky because those functions are cleaning up different subsystems.
Maybe you could group them in some way and figure out some method of
skipping entire groups with some kind of super-duper fast path, but
it's not obvious to me how to make that work. And I think you'd need
a pretty considerable speed-up, too. My gut says that even knocking
50% off, while it might be really nice for other reasons, is not going
to be enough to make sticking it inside COPY workable. I bet you need
an order-of-magnitude speed-up, maybe more.

It seems like a good slice of the problem here comes from the
difficulties of being certain what the state is after a longjmp. It
seems like you could get around all of these difficulties almost
completely if the type input function were empowered to return either
(1) a Datum which is the result of the conversion or (2) an SQLSTATE
and error message indicating what went wrong. We're already willing
to believe that cleanup isn't required when the function returns
successfully, so we ought to also believe it when the function returns
a failure result (as opposed to throwing an error indicating a
failure). The conditions that require cleanup here are probably
transient: take an LWLock, do something, release the LWLock. As long
as you know that you haven't stopped somewhere in the middle of that
sequence, it seems like it should be reasonably safe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-09-26 03:15:40 Re: Serializable Snapshot Isolation
Previous Message Tom Lane 2010-09-25 23:32:09 Re: pgsql: git_topo_order script, to match up commits across branches.