select ... where <column>='' does a seqscan [auf Viren geprueft]

From: Silvio Matthes <silvio(dot)matthes(at)xcom(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: select ... where <column>='' does a seqscan [auf Viren geprueft]
Date: 2004-08-24 09:59:45
Message-ID: OFDEF180CE.2A889972-ONC1256EFA.00346639-C1256EFA.0036E916@xcom.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I'm new to the list and did not find a suitable answer to my question so
here it is:

I try to select the rows of a table where the content of a varchar-column
is empty ('') and PostgresQL is doing a seqscan.

I've tried this on a PostgresQL-Serverversion 7.3.4 and 8.0 beta1.

To understand my example better, here's the layout of the content of this
table:

I have a table document. For each document there are a couple of
parameters in table document_param (name-value-pairs).
The table now holds 7 million rows, so a seqscan is quite expensive.
Now some values are empty (as in this example is the param_value of
READ_DATE).

document_param_id | document_id | param_name |
param_value |
---------------------------+---------------------------+------------+------------------+
1010110101000000007482877 | 1010110101000000001090647 | KONTO_NR |
1000000000 |
1010110101000000007482878 | 1010110101000000001090647 | KZ_READ | N
|
1010110101000000007482879 | 1010110101000000001090647 | READ_DATE | |
1010110101000000007482880 | 1010110101000000001090647 | ENTAX_NR |
2000000000000000 |
1010110101000000007482881 | 1010110101000000001090647 | DOC_SOURCE | 400
|
1010110101000000007482882 | 1010110101000000001090647 | KUNDEN_NR |
1000000 |

I want to get all rows where for example the 'KONTO_NR' is empty.
I tried this with:

explain select * from document_params where param_name='KONTO_NR' and
param_value='';

this produced:

QUERY PLAN
------------------------------------------------------------------------------------------
Seq Scan on document_params (cost=0.00..241600.27 rows=152073 width=95)
Filter: (((param_name)::text = 'KONTO_NR'::text) AND
((param_value)::text = ''::text))
(2 rows)

There's a multicolumn index (param_value,param_name) on document_params.
To be sure the index works I changed the select to

explain select * from document_params where param_name='KONTO_NR' and
param_value=' ';

(looking for a space in param_value), and viola, the index is used. But
not if I'm looking for ''.

Following a reply to an similar question where NULL-values where wanted, I
made a partial index:

create index idx_empty on document_params(param_name) where
param_value='';

But PostgresQL does not use it.
When I disable seqscan (set enable_seqscan=false), an explain returns
this:

QUERY PLAN
--------------------------------------------------------------------------------------------
Index Scan using idx_empty on document_params (cost=0.00..591783.84
rows=152073 width=95)
Index Cond: ((param_name)::text = 'KONTO_NR'::text)
Filter: ((param_value)::text = ''::text)

So using the index does need more time than a sequential scan?

How can I get the rows with empty values from the table without doing a
seqscan?

Any help would be highly appreciated.

Ciao,

Silvio Matthes

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2004-08-24 10:04:48 Re: pg_dump in stand alone backend
Previous Message David Suela Fernández 2004-08-24 09:34:30 problems with pg_dump