Re: slow queries over information schema.tables

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: slow queries over information schema.tables
Date: 2018-12-05 17:56:57
Message-ID: 20181205175657.s7mt6as356kwvrji@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
> The core of the problem, I think, is that we're unable to convert the
> condition on table_name into an indexscan on pg_class.relname, because
> the view has cast pg_class.relname to the sql_identifier domain.
>
> There are two different issues in that. One is that the domain might
> have constraints (though in reality it does not), so the planner can't
> throw away the CoerceToDomain node, and thus can't match the expression
> to the index. Even if we did throw away the CoerceToDomain, it still
> would not work because the domain is declared to be over varchar, and
> so there's a cast-to-varchar underneath the CoerceToDomain.

Couldn't we make expression simplification replace CoerceToDomain with a
RelabelType if the constraint is simple enough? That's not entirely
trivial because we'd have to look into the typecache to get the
constraints, but that doesn't sound too bad.

> The most straightforward way to fix that would be to add some cross-type
> comparison operators to the name_ops operator family.

That seems reasonable.

> While we only
> directly need 'name = text' to make this case work, the opr_sanity
> regression test will complain if we don't supply a fairly complete set of
> operators, and I think not without reason. So I experimented with doing
> that, as in the very-much-WIP 0001 patch below. A name index can only
> cope with strcmp-style comparison semantics, not strcoll-style, so while
> it's okay to call the equality operator = I felt we'd better call the
> inequality operators ~<~ and so on. While the patch as presented fixes
> the case shown above, it's not all good: the problem with having a new
> 'text = name' operator is that it will also capture cases where you would
> like to have a text index working with a comparison value of type "name".
> If 'text = name' isn't part of the text_ops opclass then that doesn't
> work. I think that the reason for the join.sql regression test failure
> shown in patch 0001 is likewise that since 'text = name' isn't part of the
> text_ops opclass, the join elimination stuff is unable to prove that it
> can eliminate a join to a unique text column. This could probably be
> fixed by creating yet another dozen cross-type operators that implement
> text vs name comparisons using strcoll semantics (hence, using the normal
> '<' operator names), and adding that set to the text_ops opfamily.
> I didn't do that here (yet), because it seems pretty tedious.

Is there a way we could make that less laborious by allowing a bit more
casting?

> 0002 is a really preliminary POC for eliminating CoerceToDomain nodes
> at plan time if the domain has no constraints to check. While this is
> enough to check the effects on plan selection, it's certainly not
> committable as-is, because the resulting plan is broken if someone then
> adds a constraint to the domain. (There is a regression test failure,
> not shown in 0002, for a case that tests exactly that scenario.)

Hah.

> What we would have to do to make 0002 committable is to add sinval
> signaling to invalidate any cached plan that's dependent on an
> assumption of no constraints on a domain. This is something that
> we probably want anyway, since it would be necessary if we ever want
> to compile domain constraint expressions as part of the plan tree
> rather than leaving them to run-time.

Yea, that seems good. I don't like the fact that expression
initialization does the lookups for constraints right now.

> While the sinval additions per se would be fairly straightforward,
> I wonder whether anyone is likely to complain about the race conditions
> that will ensue from not taking any locks associated with the domain
> type; i.e. it's possible that a query would execute with slightly
> out-of-date assumptions about what constraints a domain has. I suspect
> that we have race conditions like that already, but they might be worse
> if we inspect the domain during planning rather than at executor
> startup. Is anyone worried enough about that to want to add locking
> overhead to domain usage? (I'm not.)

I'm not either.

> 0001+0002 still don't get the job done: now we strip off the
> CoerceToDomain successfully, but we still have "relname::varchar"
> being compared to the comparison value, so we still can't match
> that to the index. 0003 is a somewhat klugy solution to that part,
> allowing indxpath.c to construct a name equality test from a texteq
> restriction condition. (This is semantically OK because equality
> is equality in both name and text domains. We could not convert
> inequalities, at least not as freely; maybe it could be done in
> C locale, but I've not done anything about that here.)
>
> With all three patches in place, we get
>
> regression=# explain select * from pg_class where relname::information_schema.sql_identifier = 'foo'::text;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------
> Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.30 rows=7 width=781)
> Index Cond: (relname = 'foo'::text)
> Filter: ((relname)::text = 'foo'::text)
> (3 rows)
>
> so we've fixed the lack of indexscan but we're still a bit off about the
> statistics. I don't have any immediate proposal about how to fix the
> latter, but anyway this is enough to make Pavel's case a lot better.

> 0003 essentially converts "namecol::text texteq textvalue" into
> "namecol nameeqtext textvalue", relying on the new equality
> operator introduced by 0001.

Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
usable outside of one odd builtin type. I was wondering for a bit
whether we could have logic to move the cast to the other side of an
operator, but I don't see how we could make that generally safe.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Igrishin 2018-12-05 18:14:57 Re: proposal: plpgsql pragma statement
Previous Message Alvaro Herrera 2018-12-05 17:56:04 Re: psql display of foreign keys