Re: percentile value check can be slow

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jotpe <jotpe(at)posteo(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: percentile value check can be slow
Date: 2017-11-19 12:23:42
Message-ID: 11957032-4817-8431-e3b5-456cb2259946@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/19/2017 03:10 AM, David Fetter wrote:
> On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:
>> Hi,
>>
>> ...
>>
>> Is 'recognizer' an established definition I should know? Is it the same
>> as 'validator' or is it something new/different?
>
> I borrowed it from http://langsec.org/
>
> I'm not entirely sure what you mean by a validator, but a recognizer
> is something that gives a quick and sure read as to whether the input
> is well-formed. In general, it's along the lines of a tokenizer, a
> parser, and something that does very light post-parse analysis for
> correctness of form.
>
> For the case that started the thread, a recognizer would check
> something along the lines of
>
> CHECK('[0,1]' @> ALL(input_array))
>

OK, thanks. From what I understand, recognizer is more about recognizing
if a string is valid within a given formal language (essentially, if
it's a well-formed program). That may not be the right term for checks
on parameter values.

OTOH we already have "validators" on a number of places - functions
checking various parameters, e.g. reloptions for FDWs, etc.

But I guess the naming can be solved later ...

>>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
>>> Allow optional CHECK constraints in CREATE AGGREGATE for direct
>>> arguments.
>>>
>>
>> How will any of the approaches deal with something like
>>
>> select percentile_cont((select array_agg(v) from p))
>> within group (order by a) from t;
>>
>> In this case the the values are unknown after the parse analysis, so I
>> guess it does not really address that.
>
> It doesn't. Does it make sense to do a one-shot execution for cases
> like that? It might well be worth it to do the aggregate once in
> advance as a throw-away if the query execution time is already going
> to take awhile. Of course, you can break that one by making p a JOIN
> to yet another thing...
>
>> FWIW while I won't stand in the way of improving this, I wonder if this
>> is really worth the additional complexity. If you get errors like this
>> with a static list of values, you will fix the list and you're done. If
>> the list is dynamic (computed in the query itself), you'll still get the
>> error much later during query execution.
>>
>> So if you're getting many failures like this for the "delayed error
>> reporting" to be an issue, perhaps there's something wrong in you stack
>> and you should address that instead?
>
> I'd like to think that getting something to fail quickly and cheaply
> when it can will give our end users a better experience. Here,
> "cheaply" refers to their computing resources and time.

The trouble is, this increases execution time for everyone, including
people who carefully construct the parameter values. That seems rather
undesirable.

>
> Clearly, not having this happen in this case bothered Johannes
> enough to wade in here.
>

No. He was surprised the error is reported after significant amount of
time, but he does not seem to claim failing faster would be valuable to
him. That is your assumption, and I have my doubts about it.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jotpe 2017-11-19 14:37:08 Re: percentile value check can be slow
Previous Message David CARLIER 2017-11-19 09:00:36 [PATCH] Porting small OpenBSD changes.