From: | Vinicius Segalin <vinisegalin(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Predicting query runtime |
Date: | 2016-09-12 23:54:13 |
Message-ID: | CAAeH1nA0n=s5gVUU_SatzVqkjzos6bN8ZhDJj1Ly9ddCdddbEA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
2016-09-12 17:01 GMT-03:00 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>:
> On Mon, Sep 12, 2016 at 7:03 AM, Vinicius Segalin <vinisegalin(at)gmail(dot)com>
> wrote:
>
>> Hi everyone,
>>
>> I'm trying to find a way to predict query runtime (I don't need to be
>> extremely precise). I've been reading some papers about it, and people are
>> using machine learning to do so. For the feature vector, they use what the
>> DBMS's query planner provide, such as operators and their cost. The thing
>> is that I haven't found any work using PostgreSQL, so I'm struggling to
>> adapt it.
>> My question is if anyone is aware of a work that uses machine learning
>> and PostgreSQL to predict query runtime, or maybe some other method to
>> perform this.
>>
>
> I don't know about machine learning, but if there were some way to get the
> planner to tell you predicted cost in terms of a breakdown of how many
> multiples of each *_cost factor (rather than only a grand total which is
> what it does now), then it would be fairly easy to combine that with wall
> times from log_duration and do a simple linear regression.
>
> I suspect the result would be that seq_page_cost and random_page_cost
> would have huge uncertainties on them. And since pretty much every query
> has non-zero predicted values for at least one of those, the huge
> uncertainties would then pollute all the rest of the fitted values as
> well. Perhaps that is where the machine learning would come in?
>
> Another issue is the predicted costs are only meant to choose between
> different plans, not to predict overall wall time. Some parts of the
> planner only have one way to do something, and so doesn't bother to compute
> a cost for that as there is no choice to be made. This would leave glaring
> holes in the estimates (particularly for updates)
>
> But to get that data out would require quite a bit of tedious altering of
> the planner code, and then you would have to find people willing to run
> that altered code on real world databases with a high level of logging to
> gather the data. (I suspect that gathering data from only toy databases
> would not be very useful).
>
> Cheers,
>
> Jeff
>
Modifying the planner is way too complex for me at this time, so I really
can't go into that kind of solution, but I can try to use as much as the
planner gives me today, make the best out of it and hope it's enough to
give me some satisfactory results.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-09-12 23:57:10 | Re: Server crashed, now cannot start postgres |
Previous Message | Vinicius Segalin | 2016-09-12 23:48:36 | Re: Predicting query runtime |