Re: percentile value check can be slow

From: David Fetter <david(at)fetter(dot)org>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jotpe <jotpe(at)posteo(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: percentile value check can be slow
Date: 2017-11-19 02:10:10
Message-ID: 20171119021009.GJ4411@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:
> Hi,
>
> On 11/18/2017 10:30 PM, David Fetter wrote:
> > On Sat, Nov 18, 2017 at 10:44:36AM -0500, Tom Lane wrote:
> >> jotpe <jotpe(at)posteo(dot)de> writes:
> >>> I tried to enter invalid percentile fractions, and was astonished
> >>> that it seems to be checked after many work is done?
> >>
> >> IIRC, only the aggregate's final-function is concerned with direct
> >> arguments, so it's not all that astonishing.
> >
> > It may not be surprising from the point of view of a systems
> > programmer, but it's pretty surprising that this check is deferred to
> > many seconds in when the system has all the information it need in
> > order to establish this before execution begins.
> >
> > I'm not sure I see an easy way to do this check early, but it's worth
> > trying on grounds of POLA violation. I have a couple of ideas on how
> > to do this, one less invasive but hinky, the other a lot more invasive
> > but better overall.
> >
> > Ugly Hack With Ugly Performance Consequences:
> > Inject a subtransaction at the start of execution that supplies an
> > empty input to the final function with the supplied direct
> > arguments.
>
> I'm pretty sure you realize this is quite unlikely to get accepted.

I should hope not, but I mentioned it because I'd like to get it on
the record as rejected.

> > Bigger Lift:
> > Require a separate recognizer function direct arguments and fire
> > it during post-parse analysis. Perhaps this could be called
> > recognizer along with the corresponding mrecognizer. It's not
> > clear that it's sane to have separate ones, but I thought I'd
> > bring it up for completeness.
> >
>
> Is 'recognizer' an established definition I should know? Is it the same
> as 'validator' or is it something new/different?

I borrowed it from http://langsec.org/

I'm not entirely sure what you mean by a validator, but a recognizer
is something that gives a quick and sure read as to whether the input
is well-formed. In general, it's along the lines of a tokenizer, a
parser, and something that does very light post-parse analysis for
correctness of form.

For the case that started the thread, a recognizer would check
something along the lines of

CHECK('[0,1]' @> ALL(input_array))

> > Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
> > Allow optional CHECK constraints in CREATE AGGREGATE for direct
> > arguments.
> >
>
> How will any of the approaches deal with something like
>
> select percentile_cont((select array_agg(v) from p))
> within group (order by a) from t;
>
> In this case the the values are unknown after the parse analysis, so I
> guess it does not really address that.

It doesn't. Does it make sense to do a one-shot execution for cases
like that? It might well be worth it to do the aggregate once in
advance as a throw-away if the query execution time is already going
to take awhile. Of course, you can break that one by making p a JOIN
to yet another thing...

> FWIW while I won't stand in the way of improving this, I wonder if this
> is really worth the additional complexity. If you get errors like this
> with a static list of values, you will fix the list and you're done. If
> the list is dynamic (computed in the query itself), you'll still get the
> error much later during query execution.
>
> So if you're getting many failures like this for the "delayed error
> reporting" to be an issue, perhaps there's something wrong in you stack
> and you should address that instead?

I'd like to think that getting something to fail quickly and cheaply
when it can will give our end users a better experience. Here,
"cheaply" refers to their computing resources and time. Clearly, not
having this happen in this case bothered Johannes enough to wade in
here.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 高增琦 2017-11-19 08:18:03 Patch to add dependency between client executes and static libraries
Previous Message Tomas Vondra 2017-11-19 02:03:41 Re: [HACKERS] pg_basebackup --progress output for batch execution