Re: JIT compiling with LLVM v12

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JIT compiling with LLVM v12
Date: 2018-08-26 06:16:51
Message-ID: alpine.DEB.2.21.1808260800360.11066@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


>> Now you can say that'd be solved by bumping the cost up, sure. But
>> obviously the row / cost model is pretty much out of whack here, I don't
>> see how we can make reasonable decisions in a trivial query that has a
>> misestimation by five orders of magnitude.
>
> Before JIT, it didn't matter whether the costing was wrong, provided
> that the path with the lowest cost was the cheapest path (or at least
> close enough to the cheapest path not to bother anyone). Now it does.
> If the intended path is chosen but the costing is higher than it
> should be, JIT will erroneously activate. If you had designed this in
> such a way that we added separate paths for the JIT and non-JIT
> versions and the JIT version had a bigger startup cost but a reduced
> runtime cost, then you probably would not have run into this issue, or
> at least not to the same degree. But as it is, JIT activates when the
> plan looks expensive, regardless of whether activating JIT will do
> anything to make it cheaper. As a blindingly obvious example, turning
> on JIT to mitigate the effects of disable_cost is senseless, but as
> you point out, that's exactly what happens right now.
>
> I'd guess that, as you read this, you're thinking, well, but if I'd
> added JIT and non-JIT paths for every option, it would have doubled
> the number of paths, and that would have slowed the planner down way
> too much. That's certainly true, but my point is just that the
> problem is probably not as simple as "the defaults are too low". I
> think the problem is more fundamentally that the model you've chosen
> is kinda broken. I'm not saying I know how you could have done any
> better, but I do think we're going to have to try to figure out
> something to do about it, because saying, "check-pg_upgrade is 4x
> slower, but that's just because of all those bad estimates" is not
> going to fly. Those bad estimates were harmlessly bad before, and now
> they are harmfully bad, and similar bad estimates are going to exist
> in real-world queries, and those are going to be harmful now too.
>
> Blaming the bad costing is a red herring. The problem is that you've
> made the costing matter in a way that it previously didn't.

My 0.02€ on this interesting subject.

Historically, external IOs, ak rotating disk accesses, have been the main
cost (by several order of magnitude) of executing database queries, and
cpu costs are relatively very low in most queries. The point of the query
planner is mostly to avoid very bad path wrt to IOs.

Now, even with significanly faster IOs, eg SSD's, IOs are still a few
order of magnitude slower, but less so, so cpu may matter more.

Now again, for small database data are often in memory and stay there, in
which case CPU is the only cost.

This would suggest the following approach to evaluating costs in the
planner:

(1) are the needed data already in memory? if so use cpu only costs this
implies that the planner would know about it... which is probably not the
case.

(2) if not, then optimise for IOs first, because they are likely to
be the main cost driver anyway.

(3) once an "IO-optimal" (eg not too bad) plan is selected, consider
whether to apply JIT to part of it: if cpu costs are significant and some
parts are likely to be executed a lot, with a significant high margin
because JIT costs.

Basically, I'm suggesting to reevaluate the selected plan, without
changing it, with a JIT cost to improve it, as a second stage.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tels 2018-08-26 07:46:33 Re: JIT compiling with LLVM v12
Previous Message Noah Misch 2018-08-26 03:46:00 wal_sender_timeout should ignore server-side latency