Re: Seq scan on 10million record table.. why?

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: Vincenzo Melandri <vmelandri(at)imolinfo(dot)it>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Seq scan on 10million record table.. why?
Date: 2012-10-30 14:03:54
Message-ID: CABWW-d0k4O02aW450RMbk18X+5SUSgWQDS34FCQyvz9H1R7JkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

It seems that your tables has different types for columns, see "::numeric"
in "((( big_table.key2)::numeric) =data_sequences_table.key2))"
Postgresql always uses widening conversion, so it can't use index. There
are next options to fix:
1) Make all types the same
2) If you are using some narrow type for big_table (say, int2) to save
space, you can force narrowing conversion, e.g. "b.key1=ds.key1::int2".
Note that if ds.key1 has any values that don't fit into int2, you will have
problems. And of course, use your type used instead of int2.
3) Create an index on (key1::numeric),(key2::numeric) . This is last
options this this index will be very specific to the query (or similar
ones).

Best regards, Vitalii Tymchyshyn

2012/10/30 Vincenzo Melandri <vmelandri(at)imolinfo(dot)it>

> Hi all
>
> I have a problem with a data import procedure that involve the following
> query:
>
> select a,b,c,d
> from big_table b
> join data_sequences_table ds
> on b.key1 = ds.key1 and b.key2 = ds.key2
> where ds.import_id=xxxxxxxxxx
>
> The "big table" has something like 10.000.000 records ore more
> (depending on the table, there are more than one of them).
> The data are uploaded in 20k record blocks, and the keys are written
> on "data_sequences_table". The keys are composite (key1,key2), and
> every 5-10 sequences (depending on the size of the upload) the
> data_sequences_table records are deleted.
> I have indexes on both the key on the big table and the import_id on
> the sequence table.
>
> the query plan evualuate like this:
>
> Merge Join (cost=2604203.98..2774528.51 rows=129904 width=20)
> Merge Cond: (((( big_table.key1)::numeric) =
> data_sequences_table.key1) AND ((( big_table.key2)::numeric) =
> data_sequences_table.key2))
> -> Sort (cost=2602495.47..2635975.81 rows=13392135 width=20)
> Sort Key: ((big_table.key1)::numeric), ((big_table.key2)::numeric)
> -> Seq Scan on big_table (cost=0.00..467919.35 rows=13392135
> width=20)
> -> Sort (cost=1708.51..1709.48 rows=388 width=32)
> Sort Key: data_sequences_table.key1, data_sequences_table.key2
> -> Seq Scan on data_sequences_table (cost=0.00..1691.83
> rows=388 width=32)
> Filter: (import_id = 1351592072::numeric)
>
> It executes in something like 80 seconds. The import procedure has
> more than 500 occurrences of this situation. :(
> Why is the big table evaluated with a seq scan? The result is 0 to
> 20.000 records (the query returns the records that already exists and
> should be updated, not inserted).. Can I do something to speed this
> up?
>
> --
> Vincenzo.
> Imola Informatica
>
> Ai sensi del D.Lgs. 196/2003 si precisa che le informazioni contenute
> in questo messaggio sono riservate ed a uso esclusivo del
> destinatario.
> Pursuant to Legislative Decree No. 196/2003, you are hereby informed
> that this message contains confidential information intended only for
> the use of the addressee.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Best regards,
Vitalii Tymchyshyn

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message AndyG 2012-10-30 14:54:33 Re: Slow query, where am I going wrong?
Previous Message Shaun Thomas 2012-10-30 14:02:28 Re: How to keep queries low latency as concurrency increases