Re: pg_migrator and an 8.3-compatible tsvector data type

From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: pg_migrator and an 8.3-compatible tsvector data type
Date: 2009-05-31 14:30:09
Message-ID: 4136ffa0905310730l16dd7a1fn7ce21f52ff4fc5f6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 31, 2009 at 3:04 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > I think this is basically a large-caliber foot gun.  You're going to
>> > pretend that invalid data is valid, until the user gets around to fixing
>> > it?
>>
>> What choice do we have?

I think in this case the caliber is pretty small and this might be
sufficient. It might be nice if we had a check somewhere in the
tsvector data types so people get informative errors if their
tsvectors are old-style rather than random incorrect results, but
that's mostly gilding.

In the general case of data type representation changes I think we
need something like:
While we can mark indexes as invalid (which we
>> do), how do we mark a table's contents as invalid? Should we create
>> rules so no one can see the data and then have the ALTER TABLE script
>> remove the rules after it is rebuilt?
>
> OK, what ideas do people have to prevent access to tsvector columns? I
> am thinking of renaming the tables or something.

1 Change the catalog so all the tsvector colums are bytea.

2 Include a c function like migrate_tsvector(bytea) which contains a
copy of the old data type's output function and calls the new data
type's input function on the result.

3 Include an ALTER TABLE command which calls the c function.

The gotchas I can see with this is:

1) It only works for varlenas -- There isn't a universal fixed length
data type. You would probably have to invent one.

2) I'm not sure what will happen to rules and triggers which call
functions on the old data type. If you restore the schema unchanged
and modify the catalog directly then they will still be there but have
mismatched types. Will users get errors? Will those errors be sensible
errors or nonsensical ones? Will the conversion still go ahead or will
it complain that there are things which depend on the column?

If the problems in (2) prove surmountable then this provides a general
solution for any varlena data type representation change. However it
will still be a O(n) conversion plus an index rebuild. That's
unfortunate but unless we plan to ship the full set of operators,
opclasses, opfamilies, cross-data-type operators, etc for the old data
type I see no way around it.

I haven't heard anyone suggest we should roll back the tsvector
changes and give up the features the changes provide -- and that's
just a performance feature. If that's all it took to convince us to
give up in-place-upgrade for this data type then imagine how easy it
will be to justify for actual functional features.

(Personally I think we're fooling ourselves to think Postgres is
mature enough that we won't come up with any new improvements which
will justify a data format change. I would rather hope we'll keep
coming up with massive improvements which require major changes in
every release.)

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-05-31 15:41:55 Re: information_schema.columns changes needed for OLEDB
Previous Message Bruce Momjian 2009-05-31 14:04:29 Re: pg_migrator and an 8.3-compatible tsvector data type