From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Douglas Doole <dougdoole(at)gmail(dot)com>, Christoph Berg <myon(at)debian(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Collation versioning |
Date: | 2019-12-12 14:36:15 |
Message-ID: | CAOBaU_Zg==075nNEhd+Du0x32q0ojHC1DJ86mBCW5JYKaosb5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello Thomas,
Thanks for looking at it!
On Thu, Dec 12, 2019 at 5:01 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> We create duplicate pg_depend records:
>
> [...]
>
> I wondered if that was harmless
That's the assumed behavior of recordMultipleDependencies:
/*
* Record the Dependency. Note we don't bother to check for
* duplicate dependencies; there's no harm in them.
*/
We could add a check to skip duplicates for the "track_version ==
true" path, or switch to flags if we want to also skip duplicates in
other cases, but it'll make recordMultipleDependencies a little bit
more specialised.
> but for one thing it causes duplicate warnings:
Yes, that should be avoided.
> Here's another way to get a duplicate, and in this example you also
> get an unnecessary dependency on 100 "default" for this index:
>
> postgres=# create index on t(x collate "fr_FR") where x > 'helicopter'
> collate "fr_FR";
> CREATE INDEX
> postgres=# select * from pg_depend where objid = 't_x_idx'::regclass
> and refobjversion != '';
> classid | objid | objsubid | refclassid | refobjid | refobjsubid |
> refobjversion | deptype
> ---------+-------+----------+------------+----------+-------------+---------------+---------
> 1259 | 16460 | 0 | 3456 | 12603 | 0 |
> 0:34.0 | n
> 1259 | 16460 | 0 | 3456 | 12603 | 0 |
> 0:34.0 | n
> 1259 | 16460 | 0 | 3456 | 100 | 0 |
> 0:34.0 | n
> (3 rows)
>
> Or... maybe 100 should be there, by simple analysis of the x in the
> WHERE clause, but it's the same if you write x collate "fr_FR" >
> 'helicopter' collate "fr_FR", and in that case there are no
> expressions of collation "default" anywhere.
Ah good point. That's because expression_tree_walker() will dig into
CollateExpr->args and eventually reach the underlying Var. I don't
see an easy way to avoid that while still properly recording the
required dependency for an even more realistic index such as
CREATE INDEX ON t(x COLLATE "fr_FR") WHERE x > ((x COLLATE "en_US" >
'helicopter' COLLATE "en_US")::text) collate "fr_FR";
and for instance not for
CREATE INDEX ON t(x COLLATE "fr_FR") WHERE x > ((x COLLATE "en_US" ||
'helicopter' COLLATE "en_US")) collate "fr_FR";
> The indirection through composite types works nicely:
>
> postgres=# create type foo_t as (en text collate "en_CA", fr text
> collate "fr_CA");
> CREATE TYPE
> postgres=# create table t (foo foo_t);
> CREATE TABLE
> postgres=# create index on t(foo);
> CREATE INDEX
> postgres=# select * from pg_depend where objid = 't_foo_idx'::regclass
> and refobjversion != '';
> classid | objid | objsubid | refclassid | refobjid | refobjsubid |
> refobjversion | deptype
> ---------+-------+----------+------------+----------+-------------+---------------+---------
> 1259 | 16444 | 0 | 3456 | 12554 | 0 |
> 0:34.0 | n
> 1259 | 16444 | 0 | 3456 | 12597 | 0 |
> 0:34.0 | n
> (2 rows)
>
> ... but again it shows the extra and technically unnecessary
> dependencies (only 12603 "fr_FR" is really needed):
>
> postgres=# create index on t(((foo).fr collate "fr_FR"));
> CREATE INDEX
> postgres=# select * from pg_depend where objid = 't_fr_idx'::regclass
> and refobjversion != '';
> classid | objid | objsubid | refclassid | refobjid | refobjsubid |
> refobjversion | deptype
> ---------+-------+----------+------------+----------+-------------+---------------+---------
> 1259 | 16445 | 0 | 3456 | 12603 | 0 |
> 0:34.0 | n
> 1259 | 16445 | 0 | 3456 | 12597 | 0 |
> 0:34.0 | n
> 1259 | 16445 | 0 | 3456 | 12554 | 0 |
> 0:34.0 | n
> (3 rows)
Yes :(
> I check that nested types are examined recursively, as appropriate. I
> also tested domains, arrays, arrays of domains, expressions extracting
> an element from an array of a domain with an explicit collation, and
> the only problem I could find was more ways to get duplicates. Hmm...
> what else is there that can contain a collatable type...? Ranges!
>
> postgres=# create type myrange as range (subtype = text);
> CREATE TYPE
> postgres=# drop table t;
> DROP TABLE
> postgres=# create table t (x myrange);
> CREATE TABLE
> postgres=# create index on t(x);
> CREATE INDEX
> postgres=# select * from pg_depend where objid = 't_x_idx'::regclass
> and refobjversion != '';
> classid | objid | objsubid | refclassid | refobjid | refobjsubid |
> refobjversion | deptype
> ---------+-------+----------+------------+----------+-------------+---------------+---------
> (0 rows)
>
> ... or perhaps, more realistically, a GIST index might actually be
> useful for range queries, and we're not capturing the dependency:
>
> postgres=# create index t_x_idx on t using gist (x);
> CREATE INDEX
> postgres=# select * from pg_depend where objid = 't_x_idx'::regclass
> and refobjversion != '';
> classid | objid | objsubid | refclassid | refobjid | refobjsubid |
> refobjversion | deptype
> ---------+-------+----------+------------+----------+-------------+---------------+---------
> (0 rows)
Good catch :) I fixed it locally and checked that a gist index on a
range with a subtype being a composite type does record the required
dependencies.
> The new syntax "ALTER INDEX i_name DEPENDS ON ANY COLLATION UNKNOWN
> VERSION" doesn't sound good to me, it's not "ANY" collation, it's a
> specific set of collations that we aren't listing. "ALTER INDEX
> i_name DEPENDS ON COLLATION * VERSION UNKNOWN", hrmph, no that's
> terrible... I'm not sure what would be better.
Mmm, indeed. With a 3rd round in the existing keyword, how about
"DEPENDS ON [ ANY ] REFERENCING COLLATION"? The ANY is mostly to
avoid the need for plural.
> I'm not sure if I like the idea of VACUUM reporting warnings or not. Hmm.
Even if I add this in a IsAutoVacuumWorkerProcess?
> To state more explicitly what's happening here, we're searching the
> expression trees for subexpresions that have a collation as part of
> their static type. We don't know which functions or operators are
> actually affected by the collation, though. For example, if an
> expression says "x IS NOT NULL" and x happens to be a subexpression of
> a type with a particular collation, we don't now that this
> expression's value can't possibly be affected by the collation version
> changing. So, the system will nag you to rebuild an index just
> because you mentioned it, even though the index can't be corrupted.
> To do better than that, I suppose we'd need declarations in the
> catalog to say which functions/operators are collation sensitive.
Wouldn't that still be a problem for an absurd expression like
WHERE length((val collate "en_US" > 'uh' collate "en_US")::text) > 0
And since we would still have to record a dependency on the collation
in such case, we would need to have another magic value to distinguish
"unknown" from "cannot cause corruption" collation version.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-12-12 14:41:18 | Re: Duplicate function call on timestamp2tm |
Previous Message | Li Japin | 2019-12-12 14:34:20 | Duplicate function call on timestamp2tm |