Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Yuri Astrakhan <yuriastrakhan(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.
Date: 2021-05-30 21:26:39
Message-ID: 20210530212639.67te76g2btc6bdzz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

I'm afraid not :(. I really could use some help around this...

Regards,

Andres

On 2021-05-21 17:54:16 -0400, Yuri Astrakhan wrote:
> Hi Andres, I just heard v14 beta is out (congrats!). Do you know if the
> very large query optimizations you mentioned (see last paragraph) have made
> it into v13 or v14b1? Thanks!
>
> On Thu, Oct 24, 2019 at 9:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > Hi,
> >
> > On 2019-10-24 15:12:16 +0000, PG Bug reporting form wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference: 16076
> > > Logged by: Yuri Astrakhan
> > > Email address: yuriastrakhan(at)gmail(dot)com
> > > PostgreSQL version: 12.0
> > > Operating system: Ubuntu 19.10
> > > Description:
> > >
> > > (huge thanks to @Komzpa - Darafei Praliaskouski who figured out the
> > > culprit)
> > >
> > > PostgreSQL 12.0 (Ubuntu 12.0-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
> > compiled
> > > by gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008, 64-bit
> > > POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 "
> > > PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.13.1"
> > > LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)"
> > >
> > > I have a full OpenStreetMap loaded using OpenMapTiles configuration into
> > > multiple machine configurations. A single query (wrapped as a UDF)
> > generates
> > > the whole MVT tile blob, and uses many layers, sub-queries, and UDFs.
> > When
> > > JIT is enabled, the query takes 5-10 seconds to run. Disabling JIT makes
> > the
> > > query run in milliseconds. Same queries on PostgreSQL 10 + Postgis
> > > 2.5.2/GEOS 3.6 have no issues (as there was no JIT).
> >
> > Ugh.
> >
> >
> > > Details:
> > > The query is a concatenation of multiple map layers (binary blobs in MVT
> > > format from ST_AsMVT()). All layers are "UNION ALL-ed" from subqueries,
> > one
> > > query per layer. The "explain analyze" shows huge cost of the first layer
> > > query, regardless of the order of those layers. Every layer added about
> > > 200-300ms to the execution (not planning!) time, totalling about 5-7
> > > seconds. Yet, I noticed that when I was testing a portion of layers,
> > > removing layer_waterway made query run in milliseconds (I suspect there
> > are
> > > more than one layer that causes this bug).
> >
> > > I will try to reproduce this error using OpenMapTiles's quickstart setup,
> > > but that will involve some heavy customization. Is this a known bug, or
> > > should I try to get an easy to set up repo? Thanks!
> >
> > A repro would be very helpful.
> >
> >
> > > * "perf top" results -
> > > https://gist.github.com/nyurik/fa0cb0dd935d50e796cd5d58d8d41067
> >
> > Hm. That's a flat looking profile, making it hard to glean information
> > about what's going wrong. It might just be the size of the emitted
> > code...
> >
> >
> > If it's too hard to generate a quick repro, it might already help if you
> > sent me the bitcode for such a slow query. You can do so by setting
> > 'jit_dump_bitcode' and looking for the *.bc files in the data
> > directory.
> >
> >
> > > * A few explore analyze runs with different layer orderings:
> > > * https://explain.depesz.com/s/fecC
> > > * https://explain.depesz.com/s/MpMo
> > > * https://explain.depesz.com/s/d7RU
> > > * main query -
> > > https://gist.github.com/nyurik/ed80c5e6d20f67fd14d37ea2503a9cbd
> > > * helpers -
> > https://gist.github.com/nyurik/00c4c9d69b7d4fa67c86159b9ce83027
> >
> > This looks like it creates pretty darn huge queries. There's a *lot* of
> > table references here, and each of them has their own targetlist etc.
> >
> > I want to figure out why it's *this* slow, but this many expressions are
> > going to have overhead to JIT. There's 330 generated functions in that
> > last explain. A lot of them never actually executed, because there's
> > basically no content. Presumably there's a lot of redundancy, with the
> > expressions being basically the same - but we're unable to take
> > advantage of that in 12, due to the pointer constants making the
> > expressions not actually be the same, c.f. patch at[1] (which also is
> > why there's no caching).
> >
> > Unfortunately I don't think there's going to be full blown fix for this
> > in 12, although we might be able to improve things a bit. So you'll
> > probably end up having to just disable jit. But if you can try to get me
> > enough data to work with, the situation in 13 might be better.
> >
> > Greetings,
> >
> > Andres Freund
> >
> > [1]
> > https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de
> >
Greetings,

Andres Freund

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-05-30 23:12:33 Re: XX000: unknown type of jsonb container.
Previous Message Thomas Munro 2021-05-29 13:29:26 Re: BUG #16827: macOS interrupted syscall leads to a crash