While working on algorithm of my project I came to question. Let it
be table like this (user+cookie pair is the primary key).
Periodically (with period 10 minutes) this PostgreSQL table
updated with my information.
The main problem that some of pairs (user, cookie) may be already
exists in PostgreSQL and must be updated, some not exists and must be
My first way was to DELETE row with (user, cookie) pair which I'm
going to update then INSERT new. This guarantees that there will not
be an error when (user, cookie) pair already exists in table. And
currently it works by this way.
But I think that it lead to highly fragmentation of table and it need
to be VACUUMED and ANALYZED far more frequently...
Second idea was to try to SELECT (user, cookie) pair and then UPDATE
it if it exists or INSERT if not. I has thought that if UPDATE will
rewrite same place in file with new count it may lead to more compact
table (file not grow and information about actual rows in file will
not changed). And, if actual file blocks containing (user, cookie)
pair will not be moved to new place in file, table need to be ANALYZED
But if UPDATE will actually insert new row in file, marking as 'free
to use' previous block in file which was contain previous version of
row, then again, table need to be VACUUMED and ANALYZED far more
And this second idea will be completely waste of time and code.
Because write on C code which "DELETE and INSERT" is more portably
than "SELECT than UPDATE if there are rows, or INSERT if there are
So, can anyone explain me is the actual mechanism of UPDATE can save
resources and tables from been highly fragmented? Or it gives same
results and problems and "DELETE then INSERT" is the best way?
pgsql-performance by date
|Next:||From: Carlos Benkendorf||Date: 2005-12-23 12:34:39|
|Subject: Order by behaviour|
|Previous:||From: Qingqing Zhou||Date: 2005-12-23 04:29:42|
|Subject: Re: CPU and RAM|