Re: Possible future performance improvement: sort updates/deletes by ctid

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible future performance improvement: sort updates/deletes by ctid
Date: 2008-01-30 04:51:08
Message-ID: F0238EBA67824444BC1CB4700960CB48048E111F@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> At the planner level that would be entirely the wrong way to go about
> it, because that's forcing the equivalent of a nestloop join, which is
> very unlikely to be faster for the numbers of rows that we're talking
> about here. The reason it looks faster to you is that the benefits of
> updating the document_file rows in ctid order outweigh the
> costs of the
> dumb join strategy ... but what we want to achieve here is to
> have both
> benefits, or at least to give the planner the opportunity to make a
> cost-driven decision about what to do.

Ok.

Here are some more data points, using a smaller table, v8.2.6:

Seq Scan on document_file df (cost=0.00..208480.85 rows=25101 width=662) (actual time=0.239..773.834 rows=25149 loops=1)
SubPlan
-> Index Scan using pk_document_id on document d (cost=0.00..8.27 rows=1 width=4) (actual time=0.011..0.015 rows=1 loops=25149)
Index Cond: (id = $0)
Total runtime: 4492.363 ms

vs

Hash Join (cost=1048.85..6539.32 rows=25149 width=666) (actual time=575.079..1408.363 rows=25149 loops=1)
Hash Cond: (df.document_id = d.id)
-> Seq Scan on document_file df (cost=0.00..4987.49 rows=25149 width=662) (actual time=60.724..824.195 rows=25149 loops=1)
-> Hash (cost=734.49..734.49 rows=25149 width=8) (actual time=40.271..40.271 rows=25149 loops=1)
-> Seq Scan on document d (cost=0.00..734.49 rows=25149 width=8) (actual time=0.055..22.559 rows=25149 loops=1)
Total runtime: 34961.504 ms

These are fairly repeatable for me after doing a vacuum full analyze of the two tables.

Have I simply not tuned postgres so that it knows it has everything on a single old IDE drive, not split over a few sets of raided SSD drives, hence random_page_cost should perhaps be larger than 4.0? Would that make the second estimate larger than the first estimate?

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2008-01-30 06:12:18 Will PostgreSQL get ported to CUDA?
Previous Message Tom Lane 2008-01-30 04:39:39 Re: Possible future performance improvement: sort updates/deletes by ctid