Skip site navigation (1) Skip section navigation (2)

Make the planner smarter about idiosyncratic (or broken) ORM behaviour

From: Frank Jördens <frank(at)woome(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Cc: Nic Ferrier <nferrier(at)woome(dot)com>, Edward George <ed(at)woome(dot)com>, Seb Potter <seb(at)woome(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Subject: Make the planner smarter about idiosyncratic (or broken) ORM behaviour
Date: 2009-05-22 09:33:43
Message-ID: 7d10d2df0905220233v3e54ab34lb9c3530fe3543a5c@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-sql
Almost all of the 8k queries per second that are hitting our database
are generated by a web development framework's ORM (Django), and some
of the syntax there is quite mad. Possibly the worst tic exhibited by
the translation layer is that if you filter a query on an object by
more than one property on a related object it will create a join for
each property:

woome_video=# EXPLAIN ANALYZE SELECT * FROM "webapp_person" INNER JOIN
"auth_user" ON ("webapp_person"."user_id" = "auth_user"."id") INNER
JOIN "auth_user" T3 ON ("webapp_person"."user_id" = T3."id") WHERE
"webapp_person"."is_suspended" = false  AND
"webapp_person"."is_banned" = false  AND ("webapp_person"."is_human" =
true  OR "webapp_person"."is_human" IS NULL) AND
(LOWER("auth_user"."username") = LOWER('d00ditsnicole')  OR
LOWER(T3."first_name") = LOWER('d00ditsnicole') ) AND
"webapp_person"."dob" >= '1910-01-01'  AND "webapp_person"."dob" <=
'1991-01-01'  ORDER BY "auth_user"."last_login" DESC LIMIT 30;




                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..87160.10 rows=30 width=496) (actual
time=17641.618..17641.618 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..1095311.97 rows=377 width=496) (actual
time=17641.616..17641.616 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..1090710.92 rows=554 width=230)
(actual time=17641.614..17641.614 rows=0 loops=1)
               Join Filter: ((lower((auth_user.username)::text) =
'd00ditsnicole'::text) OR (lower((t3.first_name)::text) =
'd00ditsnicole'::text))
               ->  Index Scan Backward using auth_user_last_login_idx
on auth_user  (cost=0.00..141657.02 rows=1684525 width=115) (actual
time=0.039..3315.272 rows=1685757 loops=1)
               ->  Index Scan using auth_user_pkey on auth_user t3
(cost=0.00..0.54 rows=1 width=115) (actual time=0.004..0.005 rows=1
loops=1685757)
                     Index Cond: (t3.id = auth_user.id)
         ->  Index Scan using webapp_person_user_id on webapp_person
(cost=0.00..8.29 rows=1 width=266) (never executed)
               Index Cond: (webapp_person.user_id = auth_user.id)
               Filter: ((NOT webapp_person.is_suspended) AND (NOT
webapp_person.is_banned) AND (webapp_person.is_human OR
(webapp_person.is_human IS NULL)) AND (webapp_person.dob >=
'1910-01-01'::date) AND (webapp_person.dob <= '1991-01-01'::date))
 Total runtime: 17641.871 ms
(11 rows)


There LIMIT and dob filters there are silly as well but they don't
seem to impact query performance; the trouble is the spurious 2nd join
with the T3 alias for auth_user. If I just remove that, we get:

woome_video=# EXPLAIN ANALYZE SELECT * FROM "webapp_person" INNER JOIN
"auth_user" ON ("webapp_person"."user_id" = "auth_user"."id") WHERE
"webapp_person"."is_suspended" = false  AND
"webapp_person"."is_banned" = false  AND ("webapp_person"."is_human" =
true  OR "webapp_person"."is_human" IS NULL) AND
(LOWER("auth_user"."username") = LOWER('d00ditsnicole')  OR
LOWER("auth_user"."first_name") = LOWER('d00ditsnicole') ) AND
"webapp_person"."dob" >= '1910-01-01'  AND "webapp_person"."dob" <=
'1991-01-01'  ORDER BY "auth_user"."last_login" DESC LIMIT 30;


                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6637.36..6637.43 rows=30 width=381) (actual
time=0.230..0.230 rows=0 loops=1)
   ->  Sort  (cost=6637.36..6638.30 rows=377 width=381) (actual
time=0.228..0.228 rows=0 loops=1)
         Sort Key: auth_user.last_login
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=29.88..6626.22 rows=377 width=381)
(actual time=0.162..0.162 rows=0 loops=1)
               ->  Bitmap Heap Scan on auth_user  (cost=29.88..2025.17
rows=554 width=115) (actual time=0.161..0.161 rows=0 loops=1)
                     Recheck Cond: ((lower((username)::text) =
'd00ditsnicole'::text) OR (lower((first_name)::text) =
'd00ditsnicole'::text))
                     ->  BitmapOr  (cost=29.88..29.88 rows=554
width=0) (actual time=0.158..0.158 rows=0 loops=1)
                           ->  Bitmap Index Scan on
woome_username_lower  (cost=0.00..4.60 rows=1 width=0) (actual
time=0.096..0.096 rows=0 loops=1)
                                 Index Cond: (lower((username)::text)
= 'd00ditsnicole'::text)
                           ->  Bitmap Index Scan on
auth_user_firstname_idx  (cost=0.00..25.00 rows=553 width=0) (actual
time=0.060..0.060 rows=0 loops=1)
                                 Index Cond:
(lower((first_name)::text) = 'd00ditsnicole'::text)
               ->  Index Scan using webapp_person_user_id on
webapp_person  (cost=0.00..8.29 rows=1 width=266) (never executed)
                     Index Cond: (webapp_person.user_id = auth_user.id)
                     Filter: ((NOT webapp_person.is_suspended) AND
(NOT webapp_person.is_banned) AND (webapp_person.is_human OR
(webapp_person.is_human IS NULL)) AND (webapp_person.dob >=
'1910-01-01'::date) AND (webapp_person.dob <= '1991-01-01'::date))
 Total runtime: 0.448 ms
(16 rows)

So if you do a Person.objects.filter(user__username='foo',
user__first_name='bar') in the ORM, which is filtering Person by
properties on the related user table, then it will create a join for
each of those it seems.

Our Django experts are telling me that it is probably not practical to
fix in the ORM, as it seems to be structural (anyway not fixable for
us in the near term). Hence I am wondering if anyone has an idea as to
how to make the planner smarter about such weirdness (or brokenness);
you might argue that the 2nd join there is merely syntactic bloat
which the planner might just recognize as such?

We are using:

woome_video=# select version();
                                             version
-------------------------------------------------------------------------------------------------
 PostgreSQL 8.3.5 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC)
4.2.4 (Ubuntu 4.2.4-1ubuntu3)
(1 row)


Regards,

Frank

Responses

pgsql-sql by date

Next:From: Richard HuxtonDate: 2009-05-22 09:56:49
Subject: Re: Make the planner smarter about idiosyncratic (or broken) ORM behaviour
Previous:From: Alvaro HerreraDate: 2009-05-18 18:53:53
Subject: Re: proposal for a CookBook in postgresql.org

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group