Re: Fast Search on Encrypted Feild

From: "Naoko Reeves" <naoko(at)lawlogix(dot)com>
To: "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fast Search on Encrypted Feild
Date: 2009-11-15 14:38:50
Message-ID: 076DC33A3D38CE4BBC64D35DDD9DE70C09886B54@mse4be2.mse4.exchange.ms
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

As Alban pointed out encrypting the search value and compare stored
encrypted value is very fast though it can't do LIKE search.
After I received valuable input from Merlin, Bill and John, I did some
research regarding "search against encrypted field" in general and as in
everyone's advice, I must acknowledge the cost of encrypted data for
searching and considering alternative method (partial encryption, store
in different table etc).
Thank you very again much for all the advice.

> I have a encrypted column use encrypt function.
> Querying against this column is almost not acceptable - returning 12
rows took 25,908 ms.
> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE
decrypt(phn_phone_enc,'xxx','xxx') LIKE '123%'
> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn
(decrypt(phn_phone_enc, 'xxx', 'xxx'))
> This returns 12 rows in 68 ms.
> Would this be the solution for the fast encrypted field search or does
this raise the security issue?

Is there some way you can invert the process?
Normally if you verify encrypted data (typically passwords) you would
encrypt the user-specified data and compare that to the encrypted data
in the database instead of decrypting both and comparing the actual
data. I doubt you can do that with partial data though, and since you're
showing a LIKE expression here...

One thing I notice in your query is that you're decrypting your data
twice; you're calling two different functions for the same purpose. You
may notice a speed-up if you use decrypt(text) instead of decrypt(text,
text, text) in your where clause, provided that the function is defined
stable[1] or immutable[2] as opposed to the default volatile[3].

Or is decrypting only a part of the encrypted string significantly
faster? That would imply some linear encryption algorithm in which case
you may be able to use my initial suggestion: Encrypting '123' would
create something that's comparable to the first 3 encrypted characters
of an encrypted phone number. A query would look like: SELECT
decrypt(phn_phone_enc) FROM phn WHERE phn_phone_enc LIKE
encrypt('123')||'%'. Here as well it is important that encrypt(text) is
defined stable or immutable.

1. The same input data combined with data from the DB (as it is visible
to the transaction) always yields the same result.
2. The same input data always yields the same result.
3. There is no correlation between the input data and the result.

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.

!DSPAM:991,4affebf711071508957761!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2009-11-15 16:43:19 Re: Experience with many schemas vs many databases
Previous Message Merlin Moncure 2009-11-15 14:29:24 Re: Fast Search on Encrypted Feild