From: | jotpe <jotpe(at)posteo(dot)de> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: percentile value check can be slow |
Date: | 2017-11-19 14:37:08 |
Message-ID: | 1650755f-b43e-f1b0-b9f7-b3d1b2414b7e@posteo.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 19.11.2017 13:23, Tomas Vondra wrote:
> Hi,
>
> On 11/19/2017 03:10 AM, David Fetter wrote:
>> On Sat, Nov 18, 2017 at 11:05:47PM +0100, Tomas Vondra wrote:
>>> Hi,
>>>
>>> ...
>>>
>>> Is 'recognizer' an established definition I should know? Is it the same
>>> as 'validator' or is it something new/different?
>>
>> I borrowed it from http://langsec.org/
>>
>> I'm not entirely sure what you mean by a validator, but a recognizer
>> is something that gives a quick and sure read as to whether the input
>> is well-formed. In general, it's along the lines of a tokenizer, a
>> parser, and something that does very light post-parse analysis for
>> correctness of form.
>>
>> For the case that started the thread, a recognizer would check
>> something along the lines of
>>
>> CHECK('[0,1]' @> ALL(input_array))
>>
>
> OK, thanks. From what I understand, recognizer is more about recognizing
> if a string is valid within a given formal language (essentially, if
> it's a well-formed program). That may not be the right term for checks
> on parameter values.
>
> OTOH we already have "validators" on a number of places - functions
> checking various parameters, e.g. reloptions for FDWs, etc.
>
> But I guess the naming can be solved later ...
>
>>>> Way Bigger Lift, As Far As I Can Tell, But More Fun For Users:
>>>> Allow optional CHECK constraints in CREATE AGGREGATE for direct
>>>> arguments.
>>>>
>>>
>>> How will any of the approaches deal with something like
>>>
>>> select percentile_cont((select array_agg(v) from p))
>>> within group (order by a) from t;
>>>
>>> In this case the the values are unknown after the parse analysis, so I
>>> guess it does not really address that.
>>
>> It doesn't. Does it make sense to do a one-shot execution for cases
>> like that? It might well be worth it to do the aggregate once in
>> advance as a throw-away if the query execution time is already going
>> to take awhile. Of course, you can break that one by making p a JOIN
>> to yet another thing...
>>
>>> FWIW while I won't stand in the way of improving this, I wonder if this
>>> is really worth the additional complexity. If you get errors like this
>>> with a static list of values, you will fix the list and you're done. If
>>> the list is dynamic (computed in the query itself), you'll still get the
>>> error much later during query execution.
>>>
>>> So if you're getting many failures like this for the "delayed error
>>> reporting" to be an issue, perhaps there's something wrong in you stack
>>> and you should address that instead?
>>
>> I'd like to think that getting something to fail quickly and cheaply
>> when it can will give our end users a better experience. Here,
>> "cheaply" refers to their computing resources and time.
>
> The trouble is, this increases execution time for everyone, including
> people who carefully construct the parameter values. That seems rather
> undesirable.
>
>>
>> Clearly, not having this happen in this case bothered Johannes
>> enough to wade in here.
>>
>
> No. He was surprised the error is reported after significant amount of
> time, but he does not seem to claim failing faster would be valuable to
> him. That is your assumption, and I have my doubts about it.
I did not know about the complexity that is needed to precheck the
parameters. I thought maybe it could be done easily.
If it's too hard to change that, I wouldn't want that improvement.
Best Regards
Johannes
From | Date | Subject | |
---|---|---|---|
Next Message | Arthur Zakirov | 2017-11-19 14:47:35 | Re: [HACKERS] Bug in to_timestamp(). |
Previous Message | Tomas Vondra | 2017-11-19 12:23:42 | Re: percentile value check can be slow |