Re: ALTER TYPE 2: skip already-provable no-work rewrites

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 2: skip already-provable no-work rewrites
Date: 2011-02-05 08:05:23
Message-ID: 20110205080523.GA12954@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

Thanks for the obviously thought-out review.

On Sat, Feb 05, 2011 at 01:29:35AM -0500, Robert Haas wrote:
> On Thu, Jan 27, 2011 at 2:48 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > Done as attached. ?This preserves compatibility with our current handling of
> > composite type dependencies. ?The rest you've seen before.
>
> OK, so I took a look at this in more detail today. The current logic
> for table rewrites looks like this:
>
> 1. If we're changing the data type of a column, or adding a column
> with a default, or adding/dropping OIDs, rewrite the table. Stop.
> 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table
> and check constraints.
> 3. If we're changing tablespaces, copy the table block-by-block.

Correct. It's perhaps obvious, but rewriting the table will always reindex it.

> It seems to me that the revised logic needs to look like this:
>
> 1. If we're changing the data type of a column and the existing
> contents are not binary coercible to the new contents, or if we're
> adding a column with a default or adding/dropping OIDs, rewrite the
> table. Stop.
> 2. Otherwise, if we're adding a constraint or NOT NULL, scan the table
> and check constraints.

With this patch, step 2 changes changes to "Otherwise, if we're adding a
constraint or NOT NULL, or changing a column to a binary-compatible domain with
a domain CHECK constraint, scan the table and check constraints."

> 3. If we're changing the data type of a column in the table, reindex the table.

Rebuild indexes that depend on a changing column. Other indexes can stay.

> 4. If we're changing tablespaces, copy the table block-by-block.
>
> I might be missing something, but I don't see that the patch includes
> step #3, which I think is necessary. For example, citext is binary
> coercible to text, but you can't reuse the index because the
> comparison function is different. Of course, if you're changing the
> type of a column to its already-current type, you can skip #3, but if
> that's the only case we can optimize, it's not much of an
> accomplishment. I guess this gets back to the ALTER TYPE 7 patch,
> which I haven't looked at in detail, but I have a feeling it may be
> controversial.

It's there, but it's happening rather implicitly. ATExecAlterColumnType builds
lists of indexes and constraints that depend on changing columns. Specifically,
it stashes their OIDs and the SQL to recreate them. ATPostAlterTypeCleanup
drops those objects by OID, then parses the SQL statements, now based on the
updated table definition. ATExecAddIndex and ATExecAddConstraint use those
parsed statements to recreate the objects. The key is the skip_build logic in
ATExecAddIndex: if ATRewriteTables will rewrite the table (and therefore *all*
indexes), we skip the build at that earlier stage to avoid building the same
index twice. The only thing I had to do was update the skip_build condition so
it continues to mirror the corresponding test in ATRewriteTables.

Originally I had this patch doing a full reindex, with an eye to having the next
patch reduce the scope to dependent indexes. However, all the infrastructure
was already there, and it actually made this patch smaller to skip directly to
what it does today.

ALTER TYPE 7 additionally skips builds of indexes that depend on a changing
column but can be proven compatible. So it's in the business of, for example
figuring out that text and varchar are compatible but text and citext are not.

> Another problem here is that if you have to do both #2 and #3, you
> might have been better off (or just as well off) doing #1, unless you
> can somehow jigger things so that the same scan does both the
> constraint checks and the index rebuild. That doesn't look simple.

We have no such optimization during #1, either, so #2+#3 is never worse. In
particular, #1 scans the table (# indexes total) + 1 times, while #2+#3 scans it
(# of indexes depending on changed columns) + 1 times.

There are some nice optimization opportunities here, to be sure. As a specific
first step, teach index_build to create multiple indexes with a single scan,
then have reindex_relation use that. Probably not simple. Combining that with
the ATRewriteTable scan would be less simple still.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Scott Marlowe 2011-02-05 08:38:40 Re: [HACKERS] Slow count(*) again...
Previous Message Magnus Hagander 2011-02-05 06:47:06 Re: pg_dump directory archive format / parallel pg_dump