Re: Query seem to slow if table have more than 200 million rows

From: "Ahmad Fajar" <gendowo(at)konphalindo(dot)or(dot)id>
To: "'Qingqing Zhou'" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query seem to slow if table have more than 200 million rows
Date: 2005-09-27 09:39:55
Message-ID: SVONERLVbNDN4CxpAOA00000061@ki-communication.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Qingqing,

I don't know whether the statistic got is bad or good, this is the
statistic:
scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname,
c.idx_scan, c.idx_tup_read, c.idx_tup_fetch,
scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit,
scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read,
a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit
scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b,
pg_stat_all_indexes c
scooby-# where a.relid=b.relid and a.relid=c.relid and
b.indexrelid=c.indexrelid and a.relname=b.relname and
scooby-# a.relname=c.relname and a.relname='fti_dict1';
relid | relname | indexrelid | indexrelname | idx_scan | idx_tup_read
| idx_tup_fetch | heap_blks_read | heap_blks_hit | idx
_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit
----------+-----------+------------+--------------+----------+--------------
+---------------+----------------+---------------+----
-----------+--------------+-----------------+----------------+--------------
--+---------------+---------------+--------------
22880226 | fti_dict1 | 22880231 | idx_dict3 | 0 | 0
| 0 | 0 | 0 |
0 | 0 | | |
| | 0 | 0
22880226 | fti_dict1 | 22880230 | idx_dict2 | 7 | 592799
| 592799 | 0 | 0 |
0 | 0 | | |
| | 0 | 0
22880226 | fti_dict1 | 22880229 | idx_dict1 | 0 | 0
| 0 | 0 | 0 |
0 | 0 | | |
| | 0 | 0
(3 rows)

I have try several time the query below with different keyword, but I just
got idx_tup_read and idx_tup_fetch changed, others keep zero.
The Index are:
Ids (Idx_dict1),
keywords (idx_dict2 varchar_ops),
keywords (idx_dict3 varchar_pattern_ops) ==> I use this index for query ...
keywords like 'blabla%', just for testing purpose

Regards,
ahmad fajar

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Qingqing Zhou
Sent: Selasa, 27 September 2005 8:43
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Query seem to slow if table have more than 200
million rows

""Ahmad Fajar"" <gendowo(at)konphalindo(dot)or(dot)id> wrote
>
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
> word);
>
> The table have 200 million rows, I have index the keywords field. On the
> first time my query seem to slow to get the result, about 15-60 sec to get
> the result. But if I repeat the query I will get fast result. My question
> is
> why on the first time the query seem very slow.
>
> Table structure is quite simple:
>
> Ids bigint, keywords varchar(150), weight varchar(1), dpos int.
>

The first slowness is obviously caused by disk IOs. The second time is
faster because all data pages it requires are already in buffer pool. 200
million rows is not a problem for btree index, even if your client tool
appends some spaces to your keywords at your insertion time, the ideal btree

is 5 to 6 layers high at most. Can you show the iostats of index from your
statistics view?
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S
TATS-VIEWS

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gnanavel S 2005-09-27 09:50:05 PostgreSQL overall design
Previous Message Andrey Repko 2005-09-27 09:14:31 Index not used on group by