Re: slow queries over information schema.tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: slow queries over information schema.tables
Date: 2018-12-05 17:24:54
Message-ID: 5978.1544030694@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> Slow query
> select * from information_schema.tables where table_name = 'pg_class';

Yeah. This has been complained of many times before.

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that. One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index. Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

The latter half of the problem can be exhibited in simplified form as

regression=# explain select * from pg_class where relname = 'pg_class'::text;
QUERY PLAN
------------------------------------------------------------
Seq Scan on pg_class (cost=0.00..175.41 rows=7 width=791)
Filter: ((relname)::text = 'pg_class'::text)
(2 rows)

which is much stupider than what you get with a name comparison:

regression=# explain select * from pg_class where relname = 'pg_class';
QUERY PLAN
---------------------------------------------------------------------------------------------
Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=791)
Index Cond: (relname = 'pg_class'::name)
(2 rows)

(We've seen complaints about this form of problem, too.) Since there's
no name-vs-text equality operator, we end up applying a cast to text so
that the texteq operator can be used, and now we're out of luck for
matching that to the index. To add insult to injury, we also fail to
match the cast expression to the available statistics, so that we don't
get quite the right selectivity estimate.

The most straightforward way to fix that would be to add some cross-type
comparison operators to the name_ops operator family. While we only
directly need 'name = text' to make this case work, the opr_sanity
regression test will complain if we don't supply a fairly complete set of
operators, and I think not without reason. So I experimented with doing
that, as in the very-much-WIP 0001 patch below. A name index can only
cope with strcmp-style comparison semantics, not strcoll-style, so while
it's okay to call the equality operator = I felt we'd better call the
inequality operators ~<~ and so on. While the patch as presented fixes
the case shown above, it's not all good: the problem with having a new
'text = name' operator is that it will also capture cases where you would
like to have a text index working with a comparison value of type "name".
If 'text = name' isn't part of the text_ops opclass then that doesn't
work. I think that the reason for the join.sql regression test failure
shown in patch 0001 is likewise that since 'text = name' isn't part of the
text_ops opclass, the join elimination stuff is unable to prove that it
can eliminate a join to a unique text column. This could probably be
fixed by creating yet another dozen cross-type operators that implement
text vs name comparisons using strcoll semantics (hence, using the normal
'<' operator names), and adding that set to the text_ops opfamily.
I didn't do that here (yet), because it seems pretty tedious.

Also worth noting is that the name_ops and text_pattern_ops opfamilies
are implementing identical semantics. I wonder whether we could merge
them.

While most of the other regression test output changes shown in the 0001
patch are unsurprising, one that is surprising is that a merge join on
two text columns has started sorting USING ~<~ rather than the normal
text ordering. The reason for this seems to be that the 'text = text'
operator is now a member of name_ops as well as text_ops, and
select_outer_pathkeys_for_merge arbitrarily uses the lowest-number
opfamily OID if it has a choice. We could avoid that by renumbering
name_ops to have an OID higher than text_ops, though that's certainly
klugy. Or we might just decide that we like that plan better anyway,
since strcmp-based comparison is probably faster than strcoll-based
comparison. (It'd be kind of nice if the choice were based on something
more principled than OID order, though I don't especially want to do
anything about that right now.)

Now, 0001 by itself doesn't do much for Pavel's complaint, because
the view is still forcing a cast to sql_identifier to be inserted
atop pg_class.relname, even though we no longer need any cast for
simple name-vs-text cases.

0002 is a really preliminary POC for eliminating CoerceToDomain nodes
at plan time if the domain has no constraints to check. While this is
enough to check the effects on plan selection, it's certainly not
committable as-is, because the resulting plan is broken if someone then
adds a constraint to the domain. (There is a regression test failure,
not shown in 0002, for a case that tests exactly that scenario.)

What we would have to do to make 0002 committable is to add sinval
signaling to invalidate any cached plan that's dependent on an
assumption of no constraints on a domain. This is something that
we probably want anyway, since it would be necessary if we ever want
to compile domain constraint expressions as part of the plan tree
rather than leaving them to run-time.

While the sinval additions per se would be fairly straightforward,
I wonder whether anyone is likely to complain about the race conditions
that will ensue from not taking any locks associated with the domain
type; i.e. it's possible that a query would execute with slightly
out-of-date assumptions about what constraints a domain has. I suspect
that we have race conditions like that already, but they might be worse
if we inspect the domain during planning rather than at executor
startup. Is anyone worried enough about that to want to add locking
overhead to domain usage? (I'm not.)

0001+0002 still don't get the job done: now we strip off the
CoerceToDomain successfully, but we still have "relname::varchar"
being compared to the comparison value, so we still can't match
that to the index. 0003 is a somewhat klugy solution to that part,
allowing indxpath.c to construct a name equality test from a texteq
restriction condition. (This is semantically OK because equality
is equality in both name and text domains. We could not convert
inequalities, at least not as freely; maybe it could be done in
C locale, but I've not done anything about that here.)

With all three patches in place, we get

regression=# explain select * from pg_class where relname::information_schema.sql_identifier = 'foo'::text;
QUERY PLAN
---------------------------------------------------------------------------------------------
Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.30 rows=7 width=781)
Index Cond: (relname = 'foo'::text)
Filter: ((relname)::text = 'foo'::text)
(3 rows)

so we've fixed the lack of indexscan but we're still a bit off about the
statistics. I don't have any immediate proposal about how to fix the
latter, but anyway this is enough to make Pavel's case a lot better.

0003 essentially converts "namecol::text texteq textvalue" into
"namecol nameeqtext textvalue", relying on the new equality
operator introduced by 0001. Another way we could approach
this is to dispense with 0001 altogether, and use a variant of
0003 that converts such a clause to "namecol nameeq textvalue::name".
At first glance it seems like that wouldn't work quite right:
the cast to name will silently truncate overlength text values,
which could allow such a value to be found equal to a name that
it shouldn't be equal to. However, it really would be OK because
the context is that we're assuming the converted clause is lossy
(i.e. might have false matches), and so we'll recheck the original
clause as a filter condition, which will get rid of such false
matches. The recheck behavior is just slightly-annoying overhead
in 0003 as presented, but it'd be essential in the other version.

I'm not entirely sure whether to go with 0001+0003 or this alternative
approach. The alternative is surely far less invasive; 0001 might have
more side effects I haven't thought of. However, my gut feeling is
that 0001 would be a good thing on balance because it'd give the system
considerably more information about name-vs-text comparisons than it has
now. In principle that should lead to better plans for other cases
besides the narrow one we're specifically thinking of.

Comments, complaints, other ideas?

regards, tom lane

Attachment Content-Type Size
0001-extend-name-opfamily-0.1.patch text/x-diff 27.7 KB
0002-hack-coercetodomain-0.1.patch text/x-diff 3.3 KB
0003-hack-name-text-comparison-0.1.patch text/x-diff 9.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-12-05 17:38:08 Re: minor leaks in pg_dump (PG tarball 10.6)
Previous Message Jonah H. Harris 2018-12-05 17:21:36 Re: proposal: plpgsql pragma statement