Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance

From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Forcing seq_scan off for large table joined with tiny table yeilds improved performance
Date: 2009-04-06 12:20:47
Message-ID: 49D9F39F.40106@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Scott Marlowe wrote:
>
> It's not really solved, it's just a happy coincidence that the current
> plan runs well. In order to keep the query planner making good
> choices you need to increase stats target for the field in the index
> above. The easiest way to do so is to do this:
>
> alter database mydb set default_statistics_target=100;
>
> and run analyze again:
>
> analyze;

So, i removed the index on field_name, set
default_default_statistics_target to 100, analyzed, and the results are
the same:

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58676.31 rows=218048 width=68) (actual
time=0.067..12268.394 rows=6 loops=1)
Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47500.30 rows=2398530
width=50) (actual time=0.013..6426.611 rows=2398446 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
-> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.006 rows=2 loops=1)
Total runtime: 12268.459 ms
(6 rows)

I even changed default_statistics_target to 1000:

------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1.04..58580.29 rows=208561 width=67) (actual
time=0.054..12434.283 rows=6 loops=1)
Hash Cond: ((u.field_name)::text = (t.key)::text)
-> Seq Scan on photo_info_data u (cost=0.00..47499.46 rows=2398446
width=49) (actual time=0.012..6129.923 rows=2398446 loops=1)
-> Hash (cost=1.02..1.02 rows=2 width=18) (actual
time=0.015..0.015 rows=2 loops=1)
-> Seq Scan on t_query_data t (cost=0.00..1.02 rows=2
width=18) (actual time=0.002..0.004 rows=2 loops=1)
Total runtime: 12434.338 ms
(6 rows)

Even when I run this query, I get sequential scan:

explain analyze select * from photo_info_data where field_name =
'f-spot' or field_name = 'shutter';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Seq Scan on photo_info_data (cost=0.00..59491.69 rows=1705 width=49)
(actual time=0.018..1535.963 rows=6 loops=1)
Filter: (((field_name)::text = 'f-spot'::text) OR
((field_name)::text = 'shutter'::text))
Total runtime: 1536.010 ms
(3 rows)

These are the representations of te values 'f-spot' and 'shutter' for
the field field_name in photo_info_data table:

xmltest=# select field_name, count(*) from user_info_data where
field_name in ('visina', 'spol') group by field_name;
field_name | count
------------+-------
'f-spot' | 3
'shutter' | 3
(2 rows)

Maybe my test-data is poor? As I've mentioned, photo_info_data has
little over 2300000 rows. And this is complete 'distribution' of the data:

xmltest=# select field_name, count(*) from user_info_data group by
field_name order by count(*) desc;
field_name | count
----------------+--------
field_Xx1 | 350000
field_Xx2 | 332447
field_Xx3 | 297414
field_Xx4 | 262394
field_Xx5 | 227396
field_Xx6 | 192547
field_Xx7 | 157612
field_Xx8 | 122543
field_Xx9 | 87442
field_Xx10 | 52296
field_1 | 50000
field_2 | 47389
field_3 | 42412
field_4 | 37390
field_5 | 32366
field_6 | 27238
field_7 | 22360
field_Xx11 | 17589
field_8 | 17412
field_9 | 12383
field_10 | 7386
field_11 | 2410
f-spot | 3
shutter | 3
focal | 3
flash | 3
m_city | 3
person | 3
iso | 2
(29 rows)

No matter what field_name value I enter in WHERE condition, planner
chooses sequential scan. Only when I add seperate index on field_name,
planner chooes index scan or bitmap index scan.

Mike

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2009-04-06 12:24:48 Re: difficulties with time based queries
Previous Message Matthew Wakeling 2009-04-06 11:47:40 Re: plpgsql arrays