Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.

From: Yuri Astrakhan <yuriastrakhan(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.
Date: 2021-05-21 21:54:16
Message-ID: CAJGfNe8XPM5TEDo3j2KXY8eLgzdyT-MYAQsJ7TQuAergHxZY5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Andres, I just heard v14 beta is out (congrats!). Do you know if the
very large query optimizations you mentioned (see last paragraph) have made
it into v13 or v14b1? Thanks!

On Thu, Oct 24, 2019 at 9:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2019-10-24 15:12:16 +0000, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 16076
> > Logged by: Yuri Astrakhan
> > Email address: yuriastrakhan(at)gmail(dot)com
> > PostgreSQL version: 12.0
> > Operating system: Ubuntu 19.10
> > Description:
> >
> > (huge thanks to @Komzpa - Darafei Praliaskouski who figured out the
> > culprit)
> >
> > PostgreSQL 12.0 (Ubuntu 12.0-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
> compiled
> > by gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008, 64-bit
> > POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 "
> > PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.13.1"
> > LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)"
> >
> > I have a full OpenStreetMap loaded using OpenMapTiles configuration into
> > multiple machine configurations. A single query (wrapped as a UDF)
> generates
> > the whole MVT tile blob, and uses many layers, sub-queries, and UDFs.
> When
> > JIT is enabled, the query takes 5-10 seconds to run. Disabling JIT makes
> the
> > query run in milliseconds. Same queries on PostgreSQL 10 + Postgis
> > 2.5.2/GEOS 3.6 have no issues (as there was no JIT).
>
> Ugh.
>
>
> > Details:
> > The query is a concatenation of multiple map layers (binary blobs in MVT
> > format from ST_AsMVT()). All layers are "UNION ALL-ed" from subqueries,
> one
> > query per layer. The "explain analyze" shows huge cost of the first layer
> > query, regardless of the order of those layers. Every layer added about
> > 200-300ms to the execution (not planning!) time, totalling about 5-7
> > seconds. Yet, I noticed that when I was testing a portion of layers,
> > removing layer_waterway made query run in milliseconds (I suspect there
> are
> > more than one layer that causes this bug).
>
> > I will try to reproduce this error using OpenMapTiles's quickstart setup,
> > but that will involve some heavy customization. Is this a known bug, or
> > should I try to get an easy to set up repo? Thanks!
>
> A repro would be very helpful.
>
>
> > * "perf top" results -
> > https://gist.github.com/nyurik/fa0cb0dd935d50e796cd5d58d8d41067
>
> Hm. That's a flat looking profile, making it hard to glean information
> about what's going wrong. It might just be the size of the emitted
> code...
>
>
> If it's too hard to generate a quick repro, it might already help if you
> sent me the bitcode for such a slow query. You can do so by setting
> 'jit_dump_bitcode' and looking for the *.bc files in the data
> directory.
>
>
> > * A few explore analyze runs with different layer orderings:
> > * https://explain.depesz.com/s/fecC
> > * https://explain.depesz.com/s/MpMo
> > * https://explain.depesz.com/s/d7RU
> > * main query -
> > https://gist.github.com/nyurik/ed80c5e6d20f67fd14d37ea2503a9cbd
> > * helpers -
> https://gist.github.com/nyurik/00c4c9d69b7d4fa67c86159b9ce83027
>
> This looks like it creates pretty darn huge queries. There's a *lot* of
> table references here, and each of them has their own targetlist etc.
>
> I want to figure out why it's *this* slow, but this many expressions are
> going to have overhead to JIT. There's 330 generated functions in that
> last explain. A lot of them never actually executed, because there's
> basically no content. Presumably there's a lot of redundancy, with the
> expressions being basically the same - but we're unable to take
> advantage of that in 12, due to the pointer constants making the
> expressions not actually be the same, c.f. patch at[1] (which also is
> why there's no caching).
>
> Unfortunately I don't think there's going to be full blown fix for this
> in 12, although we might be able to improve things a bit. So you'll
> probably end up having to just disable jit. But if you can try to get me
> enough data to work with, the situation in 13 might be better.
>
> Greetings,
>
> Andres Freund
>
> [1]
> https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2021-05-22 04:31:29 Re: ResultCache cache error: "cache entry already complete" in 14beta1
Previous Message David Christensen 2021-05-21 14:00:02 Re: ResultCache cache error: "cache entry already complete" in 14beta1