Re: Adding LEFT JOIN to a query has increased execution time 10 times

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Adding LEFT JOIN to a query has increased execution time 10 times
Date: 2019-01-05 20:18:10
Message-ID: 87r2dqrisj.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>>>> "AF" == Alexander Farber <alexander(dot)farber(at)gmail(dot)com> writes:

AF> Here are the only modified settings in postgresql.conf:

AF> max_connections = 120 # (change requires restart)
AF> work_mem = 8MB # min 64kB
AF> maintenance_work_mem = 128MB # min 1MB

AF> 90% of the backend source code are JSON-emitting stored functions
AF> and there is one function which is the main core of the game and is
AF> a SELECT query over 7 tables.

AF> It is called for every Websocket-connected client and delivers a
AF> JSON list of active games for the player.

AF> Until recently the query needed 1-2 seconds for completion,

That seems slow in itself, even before adding the extra join - the
explain suggests that you're both short on indexes and you're getting
pretty bad plans, possibly due to exceeding join_collapse_limit.

(You might try increasing that in your config, along with
from_collapse_limit; the default values are a legacy of the days when
CPUs were much slower and planning time more of an issue.)

AF> but after I have added a LEFT JOIN with the following table, the
AF> query takes 7-10 seconds for completion and makes the game
AF> unpleasant to play:

AF> # \d words_geoip;
AF> Table "public.words_geoip"
AF> Column | Type | Collation | Nullable | Default
AF> --------+------------------+-----------+----------+---------
AF> block | inet | | not null |
AF> lat | double precision | | |
AF> lng | double precision | | |
AF> Indexes:
AF> "words_geoip_pkey" PRIMARY KEY, btree (block)

And here's yet another missing index, resulting in your query having to
process and discard 27 million rows in the course of generating a result
of only 9 rows:

Join Filter: (CASE WHEN (g.player1 = 5) THEN u2.ip ELSE u1.ip END << i2.block)
Rows Removed by Join Filter: 27660682

(you probably wanted <<= rather than << as that comparison, if there's
any chance your geoip table might have entries for single IPs)

Fortunately, this being pg10, you can use either of these indexes:

CREATE INDEX ON words_geoip USING gist (block inet_ops);

or

CREATE INDEX ON words_geoip USING spgist (block);

As for the rest of the query, here are places you could probably
work on:

AF> LEFT JOIN words_moves m ON m.gid = g.gid
AF> AND NOT EXISTS (SELECT 1
AF> FROM words_moves m2
AF> WHERE m2.gid = m.gid
AF> AND m2.played > m.played)

Whar you're asking for here is that the words_moves row that you're
joining not have a matching row with a larger "played" value. You can do
this far more efficiently with a lateral join, given the right index.

AF> LEFT JOIN words_social s1 ON s1.uid = 5
AF> AND NOT EXISTS (SELECT 1
AF> FROM words_social s
AF> WHERE s1.uid = s.uid
AF> AND s.stamp > s1.stamp)
AF> LEFT JOIN words_social s2 ON s2.uid = (CASE WHEN g.player1 = 5 THEN
AF> g.player2 ELSE g.player1 END)
AF> AND NOT EXISTS (SELECT 1
AF> FROM words_social s
AF> WHERE s2.uid = s.uid
AF> AND s.stamp > s2.stamp)

Similar considerations apply to both of the above.

AF> WHERE 5 IN (g.player1, g.player2)
AF> AND (g.finished IS NULL OR g.finished > CURRENT_TIMESTAMP - INTERVAL '1
AF> day');

This WHERE clause could be written as

WHERE 5 IN (g.player1, g.player2)
AND coalesce(g.finished,'infinity') > (current_timestamp - interval '1 day')

and you could then create the following indexes,

CREATE INDEX ON words_games (player1, coalesce(finished,'infinity'));
CREATE INDEX ON words_games (player2, coalesce(finished,'infinity'));

which should get you a BitmapOr plan for that condition.

AF> I have also asked my question at [dba.stack]

If you ask questions like this on the IRC channel (#postgresql on
chat.freenode.net - see http://freenode.net for info or web-based client
access), you can usually get feedback in real time (I rarely answer
performance questions in email because getting responses just takes too
long). You may have to be patient.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Farber 2019-01-05 20:19:55 Re: Adding LEFT JOIN to a query has increased execution time 10 times
Previous Message Adrian Klaver 2019-01-05 17:35:40 Re: Dropping and creating a trigger