Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2

From: Rodrigo Rosenfeld Rosas <rr(dot)rosas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2
Date: 2012-11-06 18:57:00
Message-ID: 50995D7C.9020108@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Em 06-11-2012 16:42, Merlin Moncure escreveu:
> On Tue, Nov 6, 2012 at 12:09 PM, Rodrigo Rosenfeld Rosas
> <rr(dot)rosas(at)gmail(dot)com> wrote:
>> http://explain.depesz.com/s/ToX (fast on 9.1)
>> http://explain.depesz.com/s/65t (fast on 9.2)
>> http://explain.depesz.com/s/gZm (slow on 9.1)
>> http://explain.depesz.com/s/END (slow on 9.2 - funny that the generated URL
>> was END while this was my last explain :D )
> Hm -- looking at your 'slow' 9.2 query, it is reporting that the query
> took 3 seconds (reported times are in milliseconds). How are you
> timing the data? What happens when you run explain analyze
> <your_query> from psql (as in, how long does it take)?

The time I reported in the tables of my first message were the time
reported by pgAdmin3 (compiled from source).

But I get similar time when I run like this:

time psql -p 5432 -f slow.sql db_name > slow-9.2-again.explain

real 1m56.353s
user 0m0.068s
sys 0m0.020s

slow-9.2-again.explain: http://explain.depesz.com/s/zF1

>> Let me explain how the application works, how the database was designed and
>> hopefully you'll be able to guide me in the correct way to design the
>> database for this use case.
>>
>> Our application will present a big contract to some attorneys. There is
>> currently a dynamic template with around 800 fields to be extracted from
>> each contract in our system. These fields can be of different types
>> (boolean, string, number, currency, percents, fixed options, dates,
>> time-spans and so on). There is a fields tree that is maintained by the
>> application editors. The application will allow the attorneys to read the
>> contracts and highlight parts of the contract where they extracted each
>> field from and associate each field with its value interpreted by the
>> attorney and store the reference to what paragraphs in the contract
>> demonstrate where the value came from.
>>
>> Then there is an interface that will allow clients to search for
>> transactions based on its associated contracts and those ~800 fields. For
>> the particular query above, 14 of the 800 fields have been searched by this
>> particular user (most of them were boolean ones plus a few options and a
>> string field). Usually the queries perform much better when less than 10
>> fields are used in the criteria. But our client wants us to handle up to 20
>> fields in a single query or they won't close the deal and this is a really
>> important client to us.
>>
>> So, for the time being my only plan is to rollback to PG 9.1 and replace my
>> query builder that currently generate queries like slow.sql and change it to
>> generate the queries like fast.sql but I'm pretty sure this approach should
>> be avoided. I just don't know any other alternative for the time being.
>>
>> What database design would you recommend me for this use case?
> I would strongly consider investigation of hstore type along with
> gist/gin index.
> select * from company_transaction where contract_attributes @>
> 'State=>Delaware, Paid=Y';
> etc

I'm not very familiar with hstore yet but this was one of the reasons I
wanted to migrate to PG 9.2 but I won't be able to migrate the
application quickly to use hstore.

Also, I'm not sure if hstore allows us to be as flexible as we currently
are (c1 and (c2 or c3 and not (c4 and c5))). c == condition

> Barring that, I would then consider complete elimination of integer
> proxies for your variables. They make your query virtually impossible
> to read/write, and they don't help.

I'm not sure if I understood what you're talking about. The template is
dynamic and contains lots of information for each field, like type
(number, percent, string, date, etc), parent_id (auto-referencing),
aggregator_id (also auto-referencing) and several other columns. But the
values associate the field id (type_id) and the transaction id in a
unique way (see unique index in my first message of the thread). Then I
need different tables to store the actual value because we're using SQL
instead of MongoDB or something else. The table that stores the value
depend on the field type.

Maybe it would help me to understand if you could provide some example
for the design you're proposing.

Thank you very much,
Rodrigo.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gunnar "Nick" Bluth 2012-11-06 19:08:26 Re: Re: Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries
Previous Message Merlin Moncure 2012-11-06 18:42:21 Re: Query completed in < 1s in PG 9.1 and ~ 700s in PG 9.2