Re: select ... where <column>='' does a seqscan [auf Viren

From: Silvio Matthes <silvio(dot)matthes(at)xcom(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: select ... where <column>='' does a seqscan [auf Viren
Date: 2004-08-24 16:34:03
Message-ID: OF2F51FA2E.13E90045-ONC1256EFA.0056AEE1-C1256EFA.005B0241@xcom.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Tom,
>> I try to select the rows of a table where the content of a
varchar-column
>> is empty ('') and PostgresQL is doing a seqscan.

>If there are a very large number of rows with param_value='', it's
>entirely possible that using an index to find them is counterproductive.

That's right. I did some research on my database, that's what I found.

select count(*) from document_params;

count
---------
7302418

select param_name,count(param_name) from document_params where
param_value='' group by param_name;

param_name | count
------------+---------
READ_DATE | 1064944
ENTAX_NR | 85853
KONTO_NR | 6672
KUNDEN_NR | 7

So it's ok not using an index for

select count(*) from document_params where param_value='';

But

explain select count(*) from document_params where param_name='KUNDEN_NR'
and param_value='';

also did a seqscan.

And

explain select count(*) from document_params where param_name='KONTO_NR'
and param_value='test';

did an indexscan!

So at the moment it seems to me, that the multicolumn index is not working
in the expected way.
The ''-value is not the problem. The problem is that there are a million
rows with ''-value.

But in my opinion with the multicolumn index in mind the server should do
a index scan, because there are only 7 rows with param_name='KUNDEN_NR'
and param_value=''?!?

Disabling seqscan gives:

explain select count(*) from document_params where param_name='KUNDEN_NR'
and param_value='';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Aggregate (cost=639336.57..639336.57 rows=1 width=0)
-> Index Scan using dp_idx_6 on document_params (cost=0.00..638934.84
rows=160694 width=0)
Index Cond: (((param_value)::text = ''::text) AND
((param_name)::text = 'KUNDEN_NR'::text))

Postmaster is using the multicolumn index (param_value,param_name), but
not in the expected time...

as always, any help would be highly appreciated.

Ciao,

Silvio Matthes

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Gesendet von: pgsql-general-owner(at)postgresql(dot)org
24.08.2004 17:35

An
Silvio Matthes <silvio(dot)matthes(at)xcom(dot)de>
Kopie
pgsql-general(at)postgresql(dot)org
Thema
Re: [GENERAL] select ... where <column>='' does a seqscan [auf Viren
geprueft]

Silvio Matthes <silvio(dot)matthes(at)xcom(dot)de> writes:
> I try to select the rows of a table where the content of a
varchar-column
> is empty ('') and PostgresQL is doing a seqscan.

If there are a very large number of rows with param_value='', it's
entirely possible that using an index to find them is counterproductive.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eduardo S. Fontanetti 2004-08-24 16:36:18 Re: [GENERAL] Dump and Restore
Previous Message Silvio Matthes 2004-08-24 16:33:50 WG: Re: Re: select ... where <column>='' does a seqscan [auf Viren geprueft]