Suboptimal query plan when using expensive BCRYPT functions

From: Erik van Zijst <erik(dot)van(dot)zijst(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Suboptimal query plan when using expensive BCRYPT functions
Date: 2014-03-22 00:59:19
Message-ID: CA+69USsLsVb8U5=r7KYs9GYT4Rx9DtyaohsfHvLzVKS=fZNUEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi there,

I've got a relatively simple query that contains expensive BCRYPT
functions that gets optimized in a way that causes postgres to compute
more bcrypt hashes than necessary, thereby dramatically slowing things
down.

In a certain part of our application we need to lookup users by their
username, email address and password. Now we don't store plaintext
passwords and so the query needs to compute bcrypt hashes on the fly:

SELECT DISTINCT u.*
FROM auth_user u
JOIN bb_userprofile p ON p.user_id = u.id
JOIN bb_identity i ON i.profile_id = p.id
WHERE
(
(
u.username ILIKE 'detkin'
OR
i.email ILIKE 'foo(at)example(dot)com'
)
AND
(
SUBSTRING(password FROM 8) = CRYPT(
'detkin', SUBSTRING(password FROM 8))
)
)

These queries are generated by a parser that translates from an
external query language to SQL run on the database. This test db
contains 12 user records.

With a single bcrypt hash taking ~300ms to compute, this is a recipe
for disaster and so the app only allows queries that require only a
very small number of bcrypt computation.

E.g. the user must always "AND" the password lookup with a clause like
" username = 'foo' AND password = 'bar'" (username is unique).

However, while the query above technically only needs to compute 1
hash (there is a user 'detkin' and email 'foo(at)example(dot)com' does not
exist), it instead creates a query plan that computes hashes *before*
filtering on username and email, leading to 12 hash computations and a
very slow query.

The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

The schemas for the 3 tables involved are here:
http://pgsql.privatepaste.com/f72020ad0a

As a quick experiment I tried moving the joins and email lookup into a
nested IN query, but that still generates a plan that computes hashes
for all 12 users, before picking out the 1 whose username matches.

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?

Cheers,
Erik

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brett Wooldridge 2014-03-22 11:26:34 Re: Connection pooling - Number of connections
Previous Message Shaun Thomas 2014-03-21 19:43:41 Re: Getting query plan alternatives from query planner?