Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.

From: Yuri Astrakhan <yuriastrakhan(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16076: JIT causes huge delays in a complex query. jit=off solves it.
Date: 2019-10-25 01:58:25
Message-ID: CAJGfNe8ijvnocQ=rp00E3jjtWsiBRyrO7dJ67tGWR00TuWRGCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks Andres, I sent you the 2 BC files (optimized and non-optimized)
directly. Let me know if you can figure out the problem (wasn't sure if its
ok to spam the list with 2MBs of it :) ). Thanks!

On Thu, Oct 24, 2019 at 9:04 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2019-10-24 15:12:16 +0000, PG Bug reporting form wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 16076
> > Logged by: Yuri Astrakhan
> > Email address: yuriastrakhan(at)gmail(dot)com
> > PostgreSQL version: 12.0
> > Operating system: Ubuntu 19.10
> > Description:
> >
> > (huge thanks to @Komzpa - Darafei Praliaskouski who figured out the
> > culprit)
> >
> > PostgreSQL 12.0 (Ubuntu 12.0-2.pgdg19.10+1) on x86_64-pc-linux-gnu,
> compiled
> > by gcc (Ubuntu 9.2.1-9ubuntu1) 9.2.1 20191008, 64-bit
> > POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 "
> > PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.4" LIBJSON="0.13.1"
> > LIBPROTOBUF="1.3.1" WAGYU="0.4.3 (Internal)"
> >
> > I have a full OpenStreetMap loaded using OpenMapTiles configuration into
> > multiple machine configurations. A single query (wrapped as a UDF)
> generates
> > the whole MVT tile blob, and uses many layers, sub-queries, and UDFs.
> When
> > JIT is enabled, the query takes 5-10 seconds to run. Disabling JIT makes
> the
> > query run in milliseconds. Same queries on PostgreSQL 10 + Postgis
> > 2.5.2/GEOS 3.6 have no issues (as there was no JIT).
>
> Ugh.
>
>
> > Details:
> > The query is a concatenation of multiple map layers (binary blobs in MVT
> > format from ST_AsMVT()). All layers are "UNION ALL-ed" from subqueries,
> one
> > query per layer. The "explain analyze" shows huge cost of the first layer
> > query, regardless of the order of those layers. Every layer added about
> > 200-300ms to the execution (not planning!) time, totalling about 5-7
> > seconds. Yet, I noticed that when I was testing a portion of layers,
> > removing layer_waterway made query run in milliseconds (I suspect there
> are
> > more than one layer that causes this bug).
>
> > I will try to reproduce this error using OpenMapTiles's quickstart setup,
> > but that will involve some heavy customization. Is this a known bug, or
> > should I try to get an easy to set up repo? Thanks!
>
> A repro would be very helpful.
>
>
> > * "perf top" results -
> > https://gist.github.com/nyurik/fa0cb0dd935d50e796cd5d58d8d41067
>
> Hm. That's a flat looking profile, making it hard to glean information
> about what's going wrong. It might just be the size of the emitted
> code...
>
>
> If it's too hard to generate a quick repro, it might already help if you
> sent me the bitcode for such a slow query. You can do so by setting
> 'jit_dump_bitcode' and looking for the *.bc files in the data
> directory.
>
>
> > * A few explore analyze runs with different layer orderings:
> > * https://explain.depesz.com/s/fecC
> > * https://explain.depesz.com/s/MpMo
> > * https://explain.depesz.com/s/d7RU
> > * main query -
> > https://gist.github.com/nyurik/ed80c5e6d20f67fd14d37ea2503a9cbd
> > * helpers -
> https://gist.github.com/nyurik/00c4c9d69b7d4fa67c86159b9ce83027
>
> This looks like it creates pretty darn huge queries. There's a *lot* of
> table references here, and each of them has their own targetlist etc.
>
> I want to figure out why it's *this* slow, but this many expressions are
> going to have overhead to JIT. There's 330 generated functions in that
> last explain. A lot of them never actually executed, because there's
> basically no content. Presumably there's a lot of redundancy, with the
> expressions being basically the same - but we're unable to take
> advantage of that in 12, due to the pointer constants making the
> expressions not actually be the same, c.f. patch at[1] (which also is
> why there's no caching).
>
> Unfortunately I don't think there's going to be full blown fix for this
> in 12, although we might be able to improve things a bit. So you'll
> probably end up having to just disable jit. But if you can try to get me
> enough data to work with, the situation in 13 might be better.
>
> Greetings,
>
> Andres Freund
>
> [1]
> https://www.postgresql.org/message-id/20191023163849.sosqbfs5yenocez3%40alap3.anarazel.de
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Arseny Sher 2019-10-25 06:56:23 Re: ERROR: subtransaction logged without previous top-level txn record
Previous Message Chapman Flack 2019-10-25 01:45:44 Re: BUG #16046: xpath returns CDATA tag along with the value in postgres 12