From: | Michal Fapso <michal(dot)fapso(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: big data - slow select (speech search) |
Date: | 2010-07-07 13:31:29 |
Message-ID: | AANLkTimq2ap3WEQq1U8lMR0Sx8vfI8cXY1D69ouYrqno@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi Robert,
thank you for your help. I tried to cluster the table on
hyps_wordid_index and the query execution time dropped from 4.43 to
0.19 seconds which is not that far from Lucene's performance of 0.10
second.
Thanks a lot!
Miso Fapso
On 6 July 2010 02:25, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso <michal(dot)fapso(at)gmail(dot)com> wrote:
>> It took about 4.5 seconds. If I rerun it, it takes
>> less than 2 miliseconds, but it is because of the cache. I need to
>> optimize the first-run.
>>
>> laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM
>>
>> EXPLAIN ANALYZE SELECT h1.docid
>> FROM hyps AS h1
>> WHERE h1.wordid=65658;
>>
>> Bitmap Heap Scan on hyps h1 (cost=10.97..677.09 rows=171 width=4)
>> (actual time=62.106..4416.864 rows=343 loops=1)
>> Recheck Cond: (wordid = 65658)
>> -> Bitmap Index Scan on hyps_wordid_index (cost=0.00..10.92
>> rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
>> Index Cond: (wordid = 65658)
>> Total runtime: 4432.015 ms
>>
>> If I run the same query in Lucene search engine, it takes 0.105
>> seconds on the same data which is quite a huge difference.
>
> So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
> ms/row. I'm not an expert on seek times, but that might not really be
> that unreasonable, considering that those rows may be scattered all
> over the index and thus it may be basically random I/O. Have you
> tried clustering hyps on hyps_wordid_index? If you had a more
> sophisticated disk subsystem you could try increasing
> effective_io_concurrency but that's not going to help with only one
> spindle.
>
> If you run the same query in Lucene and it takes only 0.105 s, then
> Lucene is obviously doing a lot less I/O. I doubt that any amount of
> tuning of your existing schema is going to produce that kind of result
> on PostgreSQL. Using the full-text search stuff, or a gin index of
> some kind, might get you closer, but it's hard to beat a
> special-purpose engine that implements exactly the right algorithm for
> your use case.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>
From | Date | Subject | |
---|---|---|---|
Next Message | damien hostin | 2010-07-07 14:39:38 | Re: Slow query with planner row strange estimation |
Previous Message | Merlin Moncure | 2010-07-07 13:14:25 | Re: Two "equivalent" WITH RECURSIVE queries, one of them slow. |