Re: Suboptimal query plan when using expensive BCRYPT functions

From: Erik van Zijst <erik(dot)van(dot)zijst(at)gmail(dot)com>
To: bricklen <bricklen(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Suboptimal query plan when using expensive BCRYPT functions
Date: 2014-03-22 22:27:41
Message-ID: CA+69USuvcKik9WSdCwQ_zck2WnN5tRK=A8PgfebDG+3MCeFz1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Yes, that works (it does at least on my small test database).

However, these queries are generated by a parser that translates
complex parse trees from a higher level DSL that doesn't lend itself
well to logically isolating the crypt checks from the remaining
conditions, as password checks might be present at arbitrary depths.

For example:

(
active eq true
AND
(
password eq "foo"
OR
password eq "bar"
)
)
AND
(
username eq "erik"
OR
email contains "bar"
)

Currently the SQL generator translates each AST node into individual
predicates that straightforwardly concatenate into a single SQL WHERE
clause. For this to work, the individual nodes should compose well. I
don't immediately see how the above query could be automatically
translated into SQL when taking the WITH-AS approach.

I could nonetheless take a stab at it, but life would certainly be
easier if I could translate each component independently and leave
optimization to the query planner.

Cheers,
Erik

On Sat, Mar 22, 2014 at 1:51 PM, bricklen <bricklen(at)gmail(dot)com> wrote:
>
> On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst <erik(dot)van(dot)zijst(at)gmail(dot)com>
> wrote:
>>
>> Hi there,
>>
>> I've got a relatively simple query that contains expensive BCRYPT
>> functions that gets optimized in a way that causes postgres to compute
>> more bcrypt hashes than necessary, thereby dramatically slowing things
>> down.
>>
>> In a certain part of our application we need to lookup users by their
>> username, email address and password. Now we don't store plaintext
>> passwords and so the query needs to compute bcrypt hashes on the fly:
>>
>> SELECT DISTINCT u.*
>> FROM auth_user u
>> JOIN bb_userprofile p ON p.user_id = u.id
>> JOIN bb_identity i ON i.profile_id = p.id
>> WHERE
>> (
>> (
>> u.username ILIKE 'detkin'
>> OR
>> i.email ILIKE 'foo(at)example(dot)com'
>> )
>> AND
>> (
>> SUBSTRING(password FROM 8) = CRYPT(
>> 'detkin', SUBSTRING(password FROM 8))
>> )
>> )
>>
>> These queries are generated by a parser that translates from an
>> external query language to SQL run on the database. This test db
>> contains 12 user records.
>>
>> With a single bcrypt hash taking ~300ms to compute, this is a recipe
>> for disaster and so the app only allows queries that require only a
>> very small number of bcrypt computation.
>>
>> E.g. the user must always "AND" the password lookup with a clause like
>> " username = 'foo' AND password = 'bar'" (username is unique).
>>
>> However, while the query above technically only needs to compute 1
>> hash (there is a user 'detkin' and email 'foo(at)example(dot)com' does not
>> exist), it instead creates a query plan that computes hashes *before*
>> filtering on username and email, leading to 12 hash computations and a
>> very slow query.
>>
>> The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE
>>
>> The schemas for the 3 tables involved are here:
>> http://pgsql.privatepaste.com/f72020ad0a
>>
>> As a quick experiment I tried moving the joins and email lookup into a
>> nested IN query, but that still generates a plan that computes hashes
>> for all 12 users, before picking out the 1 whose username matches.
>>
>> Is there any way I can get postgres to perform the hash calculations
>> on the *result* of the other parts of the where clause, instead of the
>> other way around? Or else rewrite the query?
>>
>> Cheers,
>> Erik
>
>
> (untested), but how about something like the following:
>
> WITH au AS (
>
> SELECT DISTINCT u.*
> FROM auth_user u
> JOIN bb_userprofile p ON p.user_id = u.id
> JOIN bb_identity i ON i.profile_id = p.id
> WHERE u.username ILIKE 'detkin'
>
> OR i.email ILIKE 'foo(at)example(dot)com')
>
> SELECT au.*
> FROM au
> WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password
> FROM 8));
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message bricklen 2014-03-22 22:56:31 Re: Suboptimal query plan when using expensive BCRYPT functions
Previous Message bricklen 2014-03-22 20:51:38 Re: Suboptimal query plan when using expensive BCRYPT functions