I was wonderring if it is a known "feature" to begin with, and, if there are any plans to fix
it in future?
I had two very large tables in my database (about 30 million rows each), connected by a foreign key,
and wanted to merge them together...
Something like this:
create table a
id int primary key,
create table b
id int unique references a,
So, what I did was:
alter table a add other_data int;
update a set other_data = b.other_data from b where b.id=a.id;
This took me awfully long, but worked (I guess).
I say 'I guess', because I wasn't able so far to verify that - when I triued to do
select * from a limit 1;
It just hungs on me ... at least, it looks like it does.
Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that,
(as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to
recreate it from scratch :-(
Instead, I loaded the whole thing into a debugger, because that seems to be the only way to figure out what
the hell it is thinking about...
So, what I found out was that it seems to have recreated my entire table when I updated it, and left all the
old tuples in it as well, so, my 'select *...limit 1' query was cycling through 30 million deleted tuples, trying
to find the first one that was still valid, and that's waht was taking that long time...
First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert?
Or is my problem specific to the fact that I have altered the table and add new columns?
Now, I understand, that, if I vacuum'ed it, the problem would have been resolved... The problems I have with it
- As I said, the behaviour was so unexpected, I would have trashed the whole databse, if I wasn't able to debug
it... If there is no other possible solution, I think, at the very least, it should give the user some indication
that it's not hopelessly hung, when doing that query...
- Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished).
I was hoping to complete modifying my schema first, and then just vacuum everything once. So, it would be
REALLY, REALLY helpful for situations like that, if PG was smart enough to keep track somehow of those deleted
tuples, to avoid having to scan through them all every time...
In my particular situation, the solution would be trivial (just remembering the address of the first valid
tuple would suffice - because the entire table was updated)... I am not familiar enough with internals to suggest
anything more general than this, but EVEN fixing only this particular scenario, would, I believe, be extremely
Do you agree?
Thanks a lot!
pgsql-patches by date
|Next:||From: Marc G. Fournier||Date: 2002-04-15 17:42:02|
|Subject: Re: Commands/ directory reorganisation |
|Previous:||From: Rod Taylor||Date: 2002-04-15 14:56:25|
|Subject: Doc fix for INSERT ... (DEFAULT, ...)|
pgsql-admin by date
|Next:||From: Tom Lane||Date: 2002-04-15 17:32:10|
|Subject: Re: Extending table name max length |
|Previous:||From: Tom Lane||Date: 2002-04-15 15:19:44|
|Subject: Re: Max Number of Databases on a Server. |
pgsql-bugs by date
|Next:||From: Neil Conway||Date: 2002-04-15 18:24:51|
|Subject: Re: Alter/update large tables - VERRRY annoying behaviour!|
|Previous:||From: Michael Loftis||Date: 2002-04-15 07:48:09|
|Subject: Re: Bug #630: date/time storage problem: timestamp parsed|
pgsql-general by date
|Next:||From: Andrew Gould||Date: 2002-04-15 17:25:09|
|Subject: Re: Export From Postresql to a DBF File Format|
|Previous:||From: Tarabas||Date: 2002-04-15 17:02:16|
|Subject: Strange Update-Bug in postgres (is it a feature?) ?!|