Re: disfavoring unparameterized nested loops

From: Mike Klaas <mike(at)superhuman(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-08-02 23:14:00
Message-ID: CABOs6N2=cZo42rd5sXC5Ao1MuG4fCqL1Xw080y4E6=ZeHhonyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think that it is worth paying more than nothing to avoid plans that are
so far from optimal that they might as well take forever to execute

I recently came across this article from 2016 that expounded upon a bad
plan of the sort discussed in this thread:
https://heap.io/blog/when-to-avoid-jsonb-in-a-postgresql-schema

(The proximate cause in this case was Postgresql not collecting statistics
for fields in a JSONB column, estimating rowcount of 1, and thus creating a
pathological slowdown.)

–Mike

On Tue, Jun 22, 2021 at 7:37 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Jun 22, 2021 at 2:53 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> Yeah, I like the insurance analogy - it gets to the crux of the problem,
> because insurance is pretty much exactly about managing risk.
>
> The user's exposure to harm is what truly matters. I admit that that's
> very hard to quantify, but we should at least try to do so.
>
> We sometimes think about a plan that is 10x slower as if it's infinitely
> slow, or might as well be. But it's usually not like that
> -- it is generally meaningfully much better than the plan being 100x
> slower, which is itself sometimes appreciably better than 1000x slower. And
> besides, users often don't get anything like the optimal plan, even on what
> they would consider to be a good day (which is most days). So maybe 10x
> slower is actually the baseline good case already, without anybody knowing
> it. Most individual queries are not executed very often, even on the
> busiest databases. The extremes really do matter a lot.
>
> If a web app or OLTP query is ~10x slower than optimal then it might be
> the practical equivalent of an outage that affects the query alone
> (i.e. "infinitely slow") -- but probably not. I think that it is worth
> paying more than nothing to avoid plans that are so far from optimal that
> they might as well take forever to execute. This is not meaningfully
> different from a database outage affecting one particular query. It kind of
> is in a category of its own that surpasses "slow plan", albeit one that is
> very difficult or impossible to define formally.
>
> There may be a huge amount of variation in risk tolerance among basically
> reasonable people. For example, if somebody chooses to engage in some kind
> of extreme sport, to me it seems understandable. It's just not my cup of
> tea. Whereas if somebody chooses to never wear a seatbelt while driving,
> then to me they're simply behaving foolishly. They're not willing to incur
> the tiniest inconvenience in order to get a huge reduction in potential
> harm -- including a very real risk of approximately the worst thing that
> can happen to you. Sure, refusing to wear a seatbelt can theoretically be
> classified as just another point on the risk tolerance spectrum, but that
> seems utterly contrived to me. Some things (maybe not that many) really are
> like that, or can at least be assumed to work that way as a practical
> matter.
>
> But making
> everything slower will be a hard sell, because wast majority of workloads
> already running on Postgres don't have this issue at all, so for them it's
> not worth the expense.
>
> I think that we're accepting too much risk here. But I bet it's also true
> that we're not taking enough risk in other areas. That was really my point
> with the insurance analogy -- we can afford to take lots of individual
> risks as long as they don't increase our exposure to truly disastrous
> outcomes -- by which I mean queries that might as well take forever to
> execute as far as the user is concerned. (Easier said than done, of
> course.)
>
> A simple trade-off between fast and robust doesn't seem like a universally
> helpful thing. Sometimes it's a very unhelpful way of looking at the
> situation. If you make something more robust to extreme bad outcomes, then
> you may have simultaneously made it *faster* (not slower) for all practical
> purposes. This can happen when the increase in robustness allows the user
> to tune the system aggressively, and only take on new risks that they can
> truly live with (which wouldn't have been possible without the increase in
> robustness). For example, I imagine that the failsafe mechanism added to
> VACUUM will actually make it possible to tune VACUUM much more aggressively
> -- it might actually end up significantly improving performance for all
> practical purposes, even though technically it has nothing to do with
> performance.
>
> Having your indexes a little more bloated because the failsafe kicked-in
> is a survivable event -- the DBA lives to fight another day, and *learns*
> to tune vacuum/the app so it doesn't happen again and again. An
> anti-wraparound failure is perhaps not a survivable event -- the DBA gets
> fired. This really does seem like a fundamental difference to me.
>
> Following the insurance analogy,
> selling tornado insurance in Europe is mostly pointless.
>
> Principled skepticism of this kind of thing is of course necessary and
> welcome. It *could* be taken too far.
>
> And the lack of data also plays role - the insurance company will ask for
> higher rates when it does not have enough accurate data about the
> phenomenon, or when there's a lot of unknowns. Maybe this would allow some
> basic measure of uncertainty, based on the number and type of restrictions,
> joins, etc.
>
> I don't think that you can really model uncertainty. But you can have true
> certainty (or close to it) about a trade-off that makes the system
> fundamentally more robust over time. You can largely be certain about both
> the cost of the insurance, as well as how it ameliorates the problem in at
> least some cases.
>
> So maybe some fairly rough measure of uncertainty might work, and the user
> might specify how much risk it's willing to tolerate.
>
> I think that most or all of the interesting stuff is where you have this
> extreme asymmetry -- places where it's much more likely to be true that
> basically everybody wants that. Kind of like wearing seatbelts -- things
> that we really can claim are a universal good without too much controversy.
> There might be as few as one or two things in the optimizer that this could
> be said of. But they matter.
>
> --
> Peter Geoghegan
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-08-02 23:28:19 Re: archive status ".ready" files may be created too early
Previous Message Tom Lane 2021-08-02 23:11:10 Re: Release 13 of the PostgreSQL BuildFarm client