Skip site navigation (1) Skip section navigation (2)

Re: PostgreSQL clustering VS MySQL clustering

From: PFC <lists(at)boutiquenumerique(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: PostgreSQL clustering VS MySQL clustering
Date: 2005-01-26 19:46:49
Message-ID: opsk76kbw7th1vuj@musicbox (view raw or flat)
Thread:
Lists: pgsql-performance
http://borg.postgresql.org/docs/8.0/interactive/storage-page-layout.html


> If you vacuum as part of the transaction it's going to be more efficient
> of resources, because you have more of what you need right there (ie:
> odds are that you're on the same page as the old tuple). In cases like
> that it very likely makes a lot of sense to take a small hit in your
> transaction time up-front, instead of a larger hit doing a vacuum down
> the road.

	Some pros would be that you're going to make a disk write anyway because  
the page is modified, so why not vacuum that page while it's there. If the  
machine is CPU bound you lose, if it's IO bound you save some IO, but the  
cost of index updates has to be taken into account...

	It prompted a few questions :

Note : temp contains 128k (131072) values generated from a sequence.

create table test (id serial primary key, a integer, z integer, e integer,  
r integer, t integer, y integer ) without oids;
insert into test (id,a,z,e,r,t,y) select id,0,0,0,0,0,0 from temp;
  INSERT 0 131072


explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..2226.84 rows=126284 width=30) (ac Seq Scan  
on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.046..964.590 rows=131072 loops=1)
  Total runtime: 15628.143 ms
tual time=0.047..617.553 rows=131072 loops=1)
  Total runtime: 4432.509 ms

explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..4453.68 rows=252568 width=30) (actual  
time=52.198..611.594 rows=131072 loops=1)
  Total runtime: 5739.064 ms

explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..6680.52 rows=378852 width=30) (actual  
time=127.301..848.762 rows=131072 loops=1)
  Total runtime: 6548.206 ms

Gets slower as more and more dead tuples accumulate... normal as this is a  
seq scan. Note the row estimations getting bigger with the table size...


vacuum full test;
explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.019..779.864 rows=131072 loops=1)
  Total runtime: 5600.311 ms

vacuum full test;
explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.039..1021.847 rows=131072 loops=1)
  Total runtime: 5126.590 ms

-> Seems vacuum full does its job....

vacuum test;
explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..3894.08 rows=196608 width=30) (actual  
time=36.491..860.135 rows=131072 loops=1)
  Total runtime: 7293.698 ms

vacuum test;
explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..3894.08 rows=196608 width=30) (actual  
time=0.044..657.125 rows=131072 loops=1)
  Total runtime: 5934.141 ms

vacuum analyze test;
explain analyze update test set y=1;
  Seq Scan on test  (cost=0.00..3894.08 rows=196608 width=30) (actual  
time=0.018..871.132 rows=131072 loops=1)
  Total runtime: 5548.053 ms

-> here vacuum is about as slow as vacuum full (which is normal as the  
whole table is updated) however the row estimation is still off even after  
ANALYZE.


  Let's create a few indices :

vacuum full test;
create index testa on test(a);
create index testz on test(z);
create index teste on test(e);
create index testr on test(r);
create index testt on test(t);
-- we don't create an index on y


vacuum full test;
explain analyze update test set a=id;
  Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.044..846.102 rows=131072 loops=1)
  Total runtime: 14998.307 ms

We see that the index updating time has made this query a lot slower. This  
is normal, but :

vacuum full test;
explain analyze update test set a=id;
  Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.045..1387.626 rows=131072 loops=1)
  Total runtime: 17644.368 ms

Now, we updated ALL rows but didn't actually change a single value.  
However it took about the same time as the first one. I guess the updates  
all really took place, even if all it did was copy the rows with new  
transaction ID's.
Now, let's update a column which is not indexed :

vacuum full test;
explain analyze update test set y=id;
  Seq Scan on test  (cost=0.00..2274.72 rows=131072 width=30) (actual  
time=0.046..964.590 rows=131072 loops=1)
  Total runtime: 15628.143 ms

Takes 'bout the same time : the indexes still have to be updated to  
reference the new rows after all.

So, here is something annoying with the current approach : Updating rows  
in a table bloats ALL indices, not just those whose indexed values have  
been actually updated. So if you have a table with many indexed fields and  
you often update some obscure timestamp field, all the indices will bloat,  
which will of course be corrected by VACUUM, but vacuum will have extra  
work to do.

	I don't have suggestions, just questions :

	Is there a way that an update to the indices can be avoided if the  
indexed values do not change ?
	Would it depend if an updated tuple can be stored on the same page it was  
before (along with the old version) ?
	If the answer is Yes :
		- would saving the cost of updating the indexes pay off over vacuuming  
the page on the run to try to squeeze the new tuple version in ?
		- would it be interesting to specify for each table a target % of free  
space ('air holes') in pages for vacuum to try to achieve, in order to be  
able to insert updated row versions on the same page they were before, and  
save index updates ?

	Regards...
	



















In response to

Responses

pgsql-performance by date

Next:From: Greg StarkDate: 2005-01-26 19:48:27
Subject: Re: [SQL] OFFSET impact on Performance???
Previous:From: Doug McNaughtDate: 2005-01-26 19:42:57
Subject: Re: Upgrading from from 7.4.2 to 8.0

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group