Re: Avoiding rewrite in ALTER TABLE ALTER TYPE

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding rewrite in ALTER TABLE ALTER TYPE
Date: 2010-12-30 00:27:22
Message-ID: 20101230002722.GA32257@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 29, 2010 at 02:01:28PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Dec 29, 2010, at 11:16 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I really really dislike the notion of a "verification scan": it's
> >> basically work that is going to be useless if it fails.
>
> > I think it has potential in cases like text to xml. In that case it'll either work or fail, with no possibility of requiring a do-over. Scanning the whole table is a whole lot cheaper than rewriting it.
>
> I don't believe avoiding the write part (but not the read part, nor the
> XML syntax verification part) is a sufficiently compelling argument to
> justify having that code path. There are not enough distinct datatypes
> sharing binary representations to make this a worthwhile thing to worry
> over.
>
> Basically, I believe that the only use-case that will have more than
> epsilon number of users is "I want to make this varchar(5) into
> varchar(10), or possibly text". We can fix that case without adding a
> boatload more code that we'll have to maintain.

Those are certainly the big ones, but I've also hit these in the field:
SET timezone = 'UTC';
CREATE DOMAIN state AS text CHECK (VALUE ~ '[A-Z]{2}');
CREATE TABLE t (c0 text, c1 text, c2 timestamp);
ALTER TABLE t ALTER c0 TYPE state,
ALTER c1 TYPE varchar(6),
ALTER c2 TYPE timestamptz;

It felt normal, but I very well may have been in that epsilon unawares. I would
be disappointed to end the project with no way to avoid rewrites for them. That
being said, there are certainly more ways to achieve all of those than the one I
have proposed.

> I do have some interest in the idea of having a type-specific function
> that can recognize no-op typmod changes, but I would envision that as
> being an expression evaluation optimization: let the planner throw away
> the call to the length-checking function when it isn't going to do
> anything. It's not by any means only useful in ALTER COLUMN TYPE ---
> and in fact probably doesn't even need any bespoke code there, if we put
> it into expression_planner() instead.

That sounds significantly better. Ignorant question: how often will
expression_planner have the old typmod available? Also, would this be worth
attempting unconditionally, or might some callers use the expression too few
times and suffer from the overhead of deciding whether to use it?

Perhaps the order of patches I proposed was faulty and should have placed the
most important use cases first, like this:

1. Modify ATPrepAlterColumnType to detect a bare RelabelType transformation
expression and conditionally skip/downgrade ATRewriteTable. This would cover
varchar->text and similar domain changes, but a full reindex remains.
2. In the no-rewrite case, modify the vicinity of finish_heap_swap to narrow us
from reindexing the entire table to merely reindexing those indexes touched by
the operator class changes.
3. Further skip reindex operations when the operator class has changed but the
operator family has not.
4. Add pg_cast.castexemptor and modify CREATE CAST to populate it. Define an
exemptor (or however we name it) for the varchar length coercion cast. Modify
expression_planner to use it to strip out superfluous length coercion casts.
This would cover varchar(10)->varchar(20) and similar domain changes.
5. Define exemptors for time, timetz, timestamp, timestamptz, and interval.
6. Define exemptor for bit varying.
7. Define exemptor for numeric.
8. (subject to further discussion) Add the verification scan to the vicinity of
ATRewriteTable and skip the table rewrite when it finds no changes. This covers
varchar(20)->varchar(10) and similar domain changes, timestamp->timestamptz, and
text->xml.

Thanks,
nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2010-12-30 00:32:01 Vacuum of newly activated 8.3.12 standby receives warnings page xxx is uninitialized --- fixing
Previous Message KaiGai Kohei 2010-12-30 00:26:07 Re: sepgsql contrib module