Skip site navigation (1) Skip section navigation (2)

Re: Possible future performance improvement: sort updates/deletes by ctid

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible future performance improvement: sort updates/deletes by ctid
Date: 2008-01-30 04:51:08
Message-ID: F0238EBA67824444BC1CB4700960CB48048E111F@dmpeints002.isotach.com (view raw or flat)
Thread:
Lists: pgsql-hackers
> At the planner level that would be entirely the wrong way to go about
> it, because that's forcing the equivalent of a nestloop join, which is
> very unlikely to be faster for the numbers of rows that we're talking
> about here.  The reason it looks faster to you is that the benefits of
> updating the document_file rows in ctid order outweigh the 
> costs of the
> dumb join strategy ... but what we want to achieve here is to 
> have both
> benefits, or at least to give the planner the opportunity to make a
> cost-driven decision about what to do.

Ok.

Here are some more data points, using a smaller table, v8.2.6:


Seq Scan on document_file df  (cost=0.00..208480.85 rows=25101 width=662) (actual time=0.239..773.834 rows=25149 loops=1)
  SubPlan
    ->  Index Scan using pk_document_id on document d  (cost=0.00..8.27 rows=1 width=4) (actual time=0.011..0.015 rows=1 loops=25149)
          Index Cond: (id = $0)
Total runtime: 4492.363 ms



vs


Hash Join  (cost=1048.85..6539.32 rows=25149 width=666) (actual time=575.079..1408.363 rows=25149 loops=1)
  Hash Cond: (df.document_id = d.id)
  ->  Seq Scan on document_file df  (cost=0.00..4987.49 rows=25149 width=662) (actual time=60.724..824.195 rows=25149 loops=1)
  ->  Hash  (cost=734.49..734.49 rows=25149 width=8) (actual time=40.271..40.271 rows=25149 loops=1)
        ->  Seq Scan on document d  (cost=0.00..734.49 rows=25149 width=8) (actual time=0.055..22.559 rows=25149 loops=1)
Total runtime: 34961.504 ms


These are fairly repeatable for me after doing a vacuum full analyze of the two tables.


Have I simply not tuned postgres so that it knows it has everything on a single old IDE drive, not split over a few sets of raided SSD drives, hence random_page_cost should perhaps be larger than 4.0? Would that make the second estimate larger than the first estimate?

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
  This email has been scanned by the DMZGlobal Business Quality 
              Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


In response to

pgsql-hackers by date

Next:From: Dann CorbitDate: 2008-01-30 06:12:18
Subject: Will PostgreSQL get ported to CUDA?
Previous:From: Tom LaneDate: 2008-01-30 04:39:39
Subject: Re: Possible future performance improvement: sort updates/deletes by ctid

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group