Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.

From: Andres Freund <andres(at)anarazel(dot)de>
To: yuriastrakhan(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.
Date: 2019-10-25 01:04:56
Message-ID: 20191025010456.ubs2jdxdrb6td7m2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2019-10-24 15:12:16 +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 16076
> Logged by: Yuri Astrakhan
> Email address: yuriastrakhan(at)gmail(dot)com
> PostgreSQL version: 12.0
> Operating system: Ubuntu 19.10
> Description:
>
> (huge thanks to @Komzpa - Darafei Praliaskouski who figured out the
> culprit)
>
> PostgreSQL 12.0 (Ubuntu 12.0-2.pgdg19.10+1) on x86_64-pc-linux-gnu, compiled
> by gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008, 64-bit
> POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 "
> PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.13.1"
> LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)"
>
> I have a full OpenStreetMap loaded using OpenMapTiles configuration into
> multiple machine configurations. A single query (wrapped as a UDF) generates
> the whole MVT tile blob, and uses many layers, sub-queries, and UDFs. When
> JIT is enabled, the query takes 5-10 seconds to run. Disabling JIT makes the
> query run in milliseconds. Same queries on PostgreSQL 10 + Postgis
> 2.5.2/GEOS 3.6 have no issues (as there was no JIT).

Ugh.

> Details:
> The query is a concatenation of multiple map layers (binary blobs in MVT
> format from ST_AsMVT()). All layers are "UNION ALL-ed" from subqueries, one
> query per layer. The "explain analyze" shows huge cost of the first layer
> query, regardless of the order of those layers. Every layer added about
> 200-300ms to the execution (not planning!) time, totalling about 5-7
> seconds. Yet, I noticed that when I was testing a portion of layers,
> removing layer_waterway made query run in milliseconds (I suspect there are
> more than one layer that causes this bug).

> I will try to reproduce this error using OpenMapTiles's quickstart setup,
> but that will involve some heavy customization. Is this a known bug, or
> should I try to get an easy to set up repo? Thanks!

A repro would be very helpful.

> * "perf top" results -
> https://gist.github.com/nyurik/fa0cb0dd935d50e796cd5d58d8d41067

Hm. That's a flat looking profile, making it hard to glean information
about what's going wrong. It might just be the size of the emitted
code...

If it's too hard to generate a quick repro, it might already help if you
sent me the bitcode for such a slow query. You can do so by setting
'jit_dump_bitcode' and looking for the *.bc files in the data
directory.

> * A few explore analyze runs with different layer orderings:
> * https://explain.depesz.com/s/fecC
> * https://explain.depesz.com/s/MpMo
> * https://explain.depesz.com/s/d7RU
> * main query -
> https://gist.github.com/nyurik/ed80c5e6d20f67fd14d37ea2503a9cbd
> * helpers - https://gist.github.com/nyurik/00c4c9d69b7d4fa67c86159b9ce83027

This looks like it creates pretty darn huge queries. There's a *lot* of
table references here, and each of them has their own targetlist etc.

I want to figure out why it's *this* slow, but this many expressions are
going to have overhead to JIT. There's 330 generated functions in that
last explain. A lot of them never actually executed, because there's
basically no content. Presumably there's a lot of redundancy, with the
expressions being basically the same - but we're unable to take
advantage of that in 12, due to the pointer constants making the
expressions not actually be the same, c.f. patch at[1] (which also is
why there's no caching).

Unfortunately I don't think there's going to be full blown fix for this
in 12, although we might be able to improve things a bit. So you'll
probably end up having to just disable jit. But if you can try to get me
enough data to work with, the situation in 13 might be better.

Greetings,

Andres Freund

[1] https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Chapman Flack 2019-10-25 01:45:44 Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12
Previous Message Chapman Flack 2019-10-24 23:34:09 Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12