Re: Why is seq search preferred here by planner?

From: <mallah(at)trade-india(dot)com>
To: <josh(at)agliodbs(dot)com>
Cc: <mallah(at)trade-india(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Why is seq search preferred here by planner?
Date: 2003-04-23 17:54:45
Message-ID: 1286.219.65.233.8.1051120485.squirrel@mail.trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> Mallah,
>
>> Hmm i am not running postgresql on a CRAY :-)
>>
>> that was the time for "begin work;" since
>> i am explain analysing an update i am putting it in transaction.
>>
>> the actualt time was nearing 300 secs which is 5 mins
>>
>> i did an explain of the UPDATE FROM variant of the query it has the same plan for it.
>
> It's the same query. When you forget the FROM, postgres tries to insert it for you ... in
> fact, you generally get a notice like "Inserting missing FROM clause for table
> 'personal_account_details'".

Josh when i get that such Notices the result of update is usually worng.
I do not get that notice for the SQL i posted.

>
> Am I reading your anaylze right?
i think u may have missed parts of it.

I'm never 100% sure with 7.2 ... you *are* updating 150,000
> rows?

7.2 u mean , pgsql 7.2 ? No i am running pgsql 7.3.2

yes i did update 1,50,000 rows that time.

>
> If I'm right, then the query is using a seq scan because it's faster than an index scan for a
> large number of rows. You can always test this by runninng the query after an SET
> ENABLE_SEQSCAN = FALSE; generally the resulting query will take 5-10 times as long.

Yes as explained in docs (somewhere ) and by many ppl seq scan is faster when
majority of the rows are stisfy the search criteria.

my botheration was that if pgsql were to search the personal_account_details with userid
(which is incidently the pkey there) why would it not use the uniq index ?
logically the query shud be looking the coutry feild from the personal_account_details
for every userid (row) encoutered in email_bank table.

may be i am wrong here in correctly under standing the explain output.

tradein_clients=# explain UPDATE email_bank set country=personal_account_details.country FROM
personal_account_details where email_bank.userid > 0 and
email_bank.userid=personal_account_details.userid and ( email_bank.country <>
personal_account_details.country or email_bank.country IS NULL ); QUERY PLAN
----------------------------------------------------------------------------------------------
Hash Join (cost=14497.06..37806.55 rows=186849 width=144)
Hash Cond: ("outer".userid = "inner".userid)
Join Filter: (("outer".country <> "inner".country) OR ("outer".country IS NULL))
-> Seq Scan on email_bank (cost=0.00..16268.10 rows=216658 width=130)
Filter: (userid > 0)
-> Hash (cost=14113.45..14113.45 rows=153445 width=14)
-> Seq Scan on personal_account_details (cost=0.00..14113.45 rows=153445 width=14)
(7 rows)

I am bothered abt the second seq scan , sorry i do not understand "Hash" is there any docs that
explain me that ?

>
> Your query delay time is updating 150,000 rows, period. That's a *lot* of disk activity, and
> it can only be improved by adjusting your postgresql.conf, the disk locations of your files,
> the speed of your disk array, and your I/O bandwidth.

my WAL logs (pg_xlog) are already in a dedicated disk [ buts thats a different topic ]

>
> How long does this take?
>
> UPDATE email_bank set country=personal_account_details.country
> FROM personal_account_details
> where userid > 0 and userid=personal_account_details.userid
> and email_bank.country <> personal_account_details.country
> or email_bank.country IS NULL;

Yes It took very less , because there were actually very less rows to update .
I agree the "email_bank.country <> personal_account_details.country or email_bank.country IS NULL"
was a neat thing to do in last query which i didnt' :-(

this time since most of the rows were already equal becoz i updated them a little while
back with that 5 mins query ;-)

anyway could u plez explain the second "seq scan part" ?

HERE WAS THE EXPLIAN ANALYZE OUTPUT:

hey dont you think the parentheis i added were necessary for the query
to update proper rows ? becoz the "or" part could have evaluated to true even for
no matching userids resulting in update of unecessary rows.

tradein_clients=# begin work ; explain analyze UPDATE email_bank set
country=personal_account_details.country FROM personal_account_details where email_bank.userid > 0
and email_bank.userid=personal_account_details.userid and ( email_bank.country <>
personal_account_details.country or email_bank.country IS NULL );BEGIN
Time: 720.04 ms
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=14497.06..37806.55 rows=186849 width=144) (actual time=5874.10..8754.98
rows=1033 loops=1) Hash Cond: ("outer".userid = "inner".userid)
Join Filter: (("outer".country <> "inner".country) OR ("outer".country IS NULL))
-> Seq Scan on email_bank (cost=0.00..16268.10 rows=216658 width=130) (actual
time=0.83..1361.35 rows=156669 loops=1) Filter: (userid > 0)
-> Hash (cost=14113.45..14113.45 rows=153445 width=14) (actual time=5855.59..5855.59 rows=0
loops=1) -> Seq Scan on personal_account_details (cost=0.00..14113.45 rows=153445 width=14)
(actual time=0.04..5550.93 rows=153466 loops=1) Total runtime: 9370.74 msec
(8 rows)

>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
>
> ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9'
> the postmaster

-----------------------------------------
Get your free web based email at trade-india.com.
"India's Leading B2B eMarketplace.!"
http://www.trade-india.com/

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Richard Huxton 2003-04-23 17:55:36 Re: OUTER JOIN
Previous Message Rod Taylor 2003-04-23 17:48:38 Re: Optomizing left outer joins