Re: domain type smashing is expensive

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Subject: Re: domain type smashing is expensive
Date: 2017-09-14 00:41:12
Message-ID: 20170914004112.ok56rwyrelqpgbje@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-09-12 14:28:51 -0400, Robert Haas wrote:
> On Tue, Sep 12, 2017 at 1:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On short-running queries that return a lot of columns,
> >> SendRowDescriptionMessage's calls to getBaseTypeAndTypmod() are a
> >> noticeable expense.
> >
> > Yeah, I was never very happy with the way that the original domain
> > patch dealt with that. I think you're not even focusing on the
> > worst part, which is all the getBaseType calls in the parser.
> > I do not have a good idea about how to get rid of them though.
>
> Well, I'm focusing on the part that shows up in the profile. Prepared
> queries don't get re-parsed repeatedly, so the calls in the parser
> don't matter in that context. I'm not saying it wouldn't be nice to
> get rid of them, but it only helps people who aren't preparing their
> queries.

In my experience syscache lookups aren't particularly prominent in
profiles of non-prepared workloads. That's commonly memory allocator
(due to all the lists), raw parser, and then parse-analysis (with a
small proportion spend in the syscaches). Leaving executor side aside.

> >> + if (typid < FirstBootstrapObjectId)
> >> + break;
> >
> > I'm really unwilling to buy into an assumption that we'll never
> > have any built-in domains just to support such a crock as this.

I'm not super happy about that solution either, but it has the big
advantage of being simple and consisting of very little code. Adding a
couple comments here and a type_sanity.sql check seems to buy a good
chunk of performance for little effort. Adding additional hashtable
searches is far from free.

> I more or less expected that reaction, but I think it's a bit
> short-sighted. If somebody wanted to define a domain type in
> pg_type.h, they'd have to write any domain constraint out in
> pg_constraint.h in nodeToString() form, and it seems to me that the
> chances that we'd accept a patch are pretty much nil, because it would
> be a maintenance nuisance. Now, maybe you could argue that somebody
> might want to define a constraint-less domain in pg_type.h, but I
> can't recall any proposal to do such a thing and don't see why
> anybody'd want to do it.

Due to that reason we'd probably create such domain types outside of
bootstrap, and therefore in a separate oid range...

> > You'd need to dig around in the archives from around that time. But
> > my hazy recollection is that the argument was that clients would be
> > much more likely to know what to do with a built-in type than with
> > some domain over it. psql, for example, knows about right-justifying
> > the built-in numeric types, but it'd fail to do so for domains.
>
> Mmm, that's a good point.

Yea, I don't think we want to revise that just because of this
performance issue - it'd likely cause some subtle breakage. I'm far from
convinced that this "downcasting" is a good idea on a semantical basis,
but that seems like a separate discussion and I can't recall complaints.

> >> 2. Precompute the list of types to be sent to the client during
> >> planning instead of during execution. The point of prepared
> >> statements is supposed to be to do as much of the work as possible at
> >> prepare time so that bind/execute is as fast as possible, but we're
> >> not really adhering to that design philosophy here. However, I don't
> >> have a clear idea of exactly how to do that.
> >
> > That'd help for prepared statements, but not for simple query execution.

> Sure, but that's kinda my point. We've got to send a RowDescription
> message for every query, and if that requires smashing domain types to
> base types, we have to do it. What we don't have to do is repeat that
> work for every execution of a prepared query.

We also have done a bunch of those lookups in the planner already, so if
we'd move it there it might still be be an advantage performancewise
even for the single execution case.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-09-14 01:13:45 Re: expanding inheritance in partition bound order
Previous Message Robert Haas 2017-09-14 00:20:48 Re: Race between SELECT and ALTER TABLE NO INHERIT