Re: big data - slow select (speech search)

From: Michal Fapso <michal(dot)fapso(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: big data - slow select (speech search)
Date: 2010-07-07 13:31:29
Message-ID: AANLkTimq2ap3WEQq1U8lMR0Sx8vfI8cXY1D69ouYrqno@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Robert,

thank you for your help. I tried to cluster the table on
hyps_wordid_index and the query execution time dropped from 4.43 to
0.19 seconds which is not that far from Lucene's performance of 0.10
second.

Thanks a lot!
Miso Fapso

On 6 July 2010 02:25, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso <michal(dot)fapso(at)gmail(dot)com> wrote:
>> It took about 4.5 seconds. If I rerun it, it takes
>> less than 2 miliseconds, but it is because of the cache. I need to
>> optimize the first-run.
>>
>> laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM
>>
>> EXPLAIN ANALYZE SELECT h1.docid
>> FROM hyps AS h1
>> WHERE h1.wordid=65658;
>>
>>  Bitmap Heap Scan on hyps h1  (cost=10.97..677.09 rows=171 width=4)
>> (actual time=62.106..4416.864 rows=343 loops=1)
>>   Recheck Cond: (wordid = 65658)
>>   ->  Bitmap Index Scan on hyps_wordid_index  (cost=0.00..10.92
>> rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
>>         Index Cond: (wordid = 65658)
>>  Total runtime: 4432.015 ms
>>
>> If I run the same query in Lucene search engine, it takes 0.105
>> seconds on the same data which is quite a huge difference.
>
> So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
> ms/row.  I'm not an expert on seek times, but that might not really be
> that unreasonable, considering that those rows may be scattered all
> over the index and thus it may be basically random I/O.  Have you
> tried clustering hyps on hyps_wordid_index?  If you had a more
> sophisticated disk subsystem you could try increasing
> effective_io_concurrency but that's not going to help with only one
> spindle.
>
> If you run the same query in Lucene and it takes only 0.105 s, then
> Lucene is obviously doing a lot less I/O.  I doubt that any amount of
> tuning of your existing schema is going to produce that kind of result
> on PostgreSQL.  Using the full-text search stuff, or a gin index of
> some kind, might get you closer, but it's hard to beat a
> special-purpose engine that implements exactly the right algorithm for
> your use case.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message damien hostin 2010-07-07 14:39:38 Re: Slow query with planner row strange estimation
Previous Message Merlin Moncure 2010-07-07 13:14:25 Re: Two "equivalent" WITH RECURSIVE queries, one of them slow.