Hi - After what seemed to be a normal successful pg_migrator migration from 8.3.7 to 8.4.0, in either link or copy mode, vacuumlo fails on both our production and qa databases: Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM "public"."xml_user") Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608 does not exist Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is aborted, commands ignored until end of transaction block I migrated our qa database using pg_dump/pg_restore and vacuumlo has no problem with it. When I try querying the two databases for large objects manually I see the same error in the one that was migrated with pg_migrator: select loread(lo_open(xml_data,262144),1073741819) from xml_user where id = '10837246'; ERROR: large object 24696063 does not exist SQL state: 42704 I can also see that the pg_largeobject table is different, in the pg_restore version the Rows (estimated) is 316286 and Rows (counted) is the same, in the pg_migrator version the Rows (counted) is only 180507. Any advice on what I might look for to try and track down this problem? pg_restore on our production database takes too long so it would be really nice to use pg_migrator instead. Thanks, Jamie
Hi - This is probably more helpful - the pg_largeobject table only changed after vacuumlo, not before. When comparing pre- and post- pg_migrator databases (no vacuum or vacuumlo): select * from pg_largeobject where loid = '24696063'; in the pre- there are three rows, having pageno 0 through 3, in the post- database there are no results. Thanks for any advice, Jamie On Mon, Jul 13, 2009 at 2:13 PM, Jamie Fox <jfox(at)directcommerce(dot)com> wrote: > Hi - > After what seemed to be a normal successful pg_migrator migration from > 8.3.7 to 8.4.0, in either link or copy mode, vacuumlo fails on both our > production and qa databases: > > Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms > statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM > "public"."xml_user") > Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608 > does not exist > Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is > aborted, commands ignored until end of transaction block > > I migrated our qa database using pg_dump/pg_restore and vacuumlo has no > problem with it. When I try querying the two databases for large objects > manually I see the same error in the one that was migrated with pg_migrator: > > select loread(lo_open(xml_data,262144),1073741819) from xml_user where id = > '10837246'; > ERROR: large object 24696063 does not exist > SQL state: 42704 > > I can also see that the pg_largeobject table is different, in the > pg_restore version the Rows (estimated) is 316286 and Rows (counted) is the > same, in the pg_migrator version the Rows (counted) is only 180507. > > Any advice on what I might look for to try and track down this problem? > pg_restore on our production database takes too long so it would be really > nice to use pg_migrator instead. > > Thanks, > > Jamie > >
Jamie Fox wrote: > Hi - > After what seemed to be a normal successful pg_migrator migration from 8.3.7 > to 8.4.0, in either link or copy mode, vacuumlo fails on both our production > and qa databases: > > Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms > statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM > "public"."xml_user") > Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608 > does not exist > Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is > aborted, commands ignored until end of transaction block > > I migrated our qa database using pg_dump/pg_restore and vacuumlo has no > problem with it. When I try querying the two databases for large objects > manually I see the same error in the one that was migrated with pg_migrator: > > select loread(lo_open(xml_data,262144),1073741819) from xml_user where id = > '10837246'; > ERROR: large object 24696063 does not exist > SQL state: 42704 > > I can also see that the pg_largeobject table is different, in the pg_restore > version the Rows (estimated) is 316286 and Rows (counted) is the same, in > the pg_migrator version the Rows (counted) is only 180507. > > Any advice on what I might look for to try and track down this problem? > pg_restore on our production database takes too long so it would be really > nice to use pg_migrator instead. [ Email moved to hackers list.] Wow, I didn't test large objects specifically, and I am confused why there would be a count discrepancy. I will need to do some research unless someone else can guess about the cause. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Forwarded to hackers. --------------------------------------------------------------------------- Jamie Fox wrote: > Hi - > This is probably more helpful - the pg_largeobject table only changed after > vacuumlo, not before. When comparing pre- and post- pg_migrator databases > (no vacuum or vacuumlo): > > select * from pg_largeobject where loid = '24696063'; > > in the pre- there are three rows, having pageno 0 through 3, in the post- > database there are no results. > > Thanks for any advice, > > Jamie > > > On Mon, Jul 13, 2009 at 2:13 PM, Jamie Fox <jfox(at)directcommerce(dot)com> wrote: > > > Hi - > > After what seemed to be a normal successful pg_migrator migration from > > 8.3.7 to 8.4.0, in either link or copy mode, vacuumlo fails on both our > > production and qa databases: > > > > Jul 1 11:17:03 db2 postgres[9321]: [14-1] LOG: duration: 175.563 ms > > statement: DELETE FROM vacuum_l WHERE lo IN (SELECT "xml_data" FROM > > "public"."xml_user") > > Jul 1 11:17:03 db2 postgres[9321]: [15-1] ERROR: large object 17919608 > > does not exist > > Jul 1 11:17:03 db2 postgres[9321]: [16-1] ERROR: current transaction is > > aborted, commands ignored until end of transaction block > > > > I migrated our qa database using pg_dump/pg_restore and vacuumlo has no > > problem with it. When I try querying the two databases for large objects > > manually I see the same error in the one that was migrated with pg_migrator: > > > > select loread(lo_open(xml_data,262144),1073741819) from xml_user where id = > > '10837246'; > > ERROR: large object 24696063 does not exist > > SQL state: 42704 > > > > I can also see that the pg_largeobject table is different, in the > > pg_restore version the Rows (estimated) is 316286 and Rows (counted) is the > > same, in the pg_migrator version the Rows (counted) is only 180507. > > > > Any advice on what I might look for to try and track down this problem? > > pg_restore on our production database takes too long so it would be really > > nice to use pg_migrator instead. > > > > Thanks, > > > > Jamie > > > > -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Jamie Fox wrote: > > I can also see that the pg_largeobject table is different, in the pg_restore > > version the Rows (estimated) is 316286 and Rows (counted) is the same, in > > the pg_migrator version the Rows (counted) is only 180507. > Wow, I didn't test large objects specifically, and I am confused why > there would be a count discrepancy. I will need to do some research > unless someone else can guess about the cause. Maybe pg_largeobject is not getting frozen? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Bruce Momjian wrote: > > Jamie Fox wrote: > > > > I can also see that the pg_largeobject table is different, in the pg_restore > > > version the Rows (estimated) is 316286 and Rows (counted) is the same, in > > > the pg_migrator version the Rows (counted) is only 180507. > > > Wow, I didn't test large objects specifically, and I am confused why > > there would be a count discrepancy. I will need to do some research > > unless someone else can guess about the cause. > > Maybe pg_largeobject is not getting frozen? That would explain the change in count, but I thought we froze _everything_, and had to. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Alvaro Herrera wrote: > > Bruce Momjian wrote: > > > Jamie Fox wrote: > > > > > > I can also see that the pg_largeobject table is different, in the pg_restore > > > > version the Rows (estimated) is 316286 and Rows (counted) is the same, in > > > > the pg_migrator version the Rows (counted) is only 180507. > > > > > Wow, I didn't test large objects specifically, and I am confused why > > > there would be a count discrepancy. I will need to do some research > > > unless someone else can guess about the cause. > > > > Maybe pg_largeobject is not getting frozen? > > That would explain the change in count, but I thought we froze > _everything_, and had to. After a quick chat with Bruce it was determined that we don't freeze anything (it would be horrid for downtime if we did so in pg_migrator; and it would be useless if ran anywhere else). What we do is migrate pg_clog from the old cluster to the new. So never mind that hypothesis. Bruce noticed that the pg_dump/pg_migrator combo is failing to restore pg_largeobject's relfrozenxid. We're not sure how this is causing the errors Jamie is seeing, because what I think should happen is that scans of the table should fail with failures to open pg_clog files such-and-such, but not missing tuples ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > Bruce Momjian wrote: > > Alvaro Herrera wrote: > > > Bruce Momjian wrote: > > > > Jamie Fox wrote: > > > > > > > > I can also see that the pg_largeobject table is different, in the pg_restore > > > > > version the Rows (estimated) is 316286 and Rows (counted) is the same, in > > > > > the pg_migrator version the Rows (counted) is only 180507. > > > > > > > Wow, I didn't test large objects specifically, and I am confused why > > > > there would be a count discrepancy. I will need to do some research > > > > unless someone else can guess about the cause. > > > > > > Maybe pg_largeobject is not getting frozen? > > > > That would explain the change in count, but I thought we froze > > _everything_, and had to. > > After a quick chat with Bruce it was determined that we don't freeze > anything (it would be horrid for downtime if we did so in pg_migrator; > and it would be useless if ran anywhere else). What we do is migrate > pg_clog from the old cluster to the new. So never mind that hypothesis. FYI, we do freeze the new cluster that has only schema definitions, no data. > Bruce noticed that the pg_dump/pg_migrator combo is failing to restore > pg_largeobject's relfrozenxid. We're not sure how this is causing the > errors Jamie is seeing, because what I think should happen is that scans > of the table should fail with failures to open pg_clog files > such-and-such, but not missing tuples ... Yea, I can fix that in PG 8.4.1, but that doesn't seem like the cause of the missing rows. Alvaro and I are still investigating. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Alvaro Herrera wrote: > Bruce Momjian wrote: > > Alvaro Herrera wrote: > > > Bruce Momjian wrote: > > > > Jamie Fox wrote: > > > > > > > > I can also see that the pg_largeobject table is different, in the pg_restore > > > > > version the Rows (estimated) is 316286 and Rows (counted) is the same, in > > > > > the pg_migrator version the Rows (counted) is only 180507. > > > > > > > Wow, I didn't test large objects specifically, and I am confused why > > > > there would be a count discrepancy. I will need to do some research > > > > unless someone else can guess about the cause. > > > > > > Maybe pg_largeobject is not getting frozen? > > > > That would explain the change in count, but I thought we froze > > _everything_, and had to. > > After a quick chat with Bruce it was determined that we don't freeze > anything (it would be horrid for downtime if we did so in pg_migrator; > and it would be useless if ran anywhere else). What we do is migrate > pg_clog from the old cluster to the new. So never mind that hypothesis. > > Bruce noticed that the pg_dump/pg_migrator combo is failing to restore > pg_largeobject's relfrozenxid. We're not sure how this is causing the > errors Jamie is seeing, because what I think should happen is that scans > of the table should fail with failures to open pg_clog files > such-and-such, but not missing tuples ... Jamie, is it possible for you to apply the attached patch to the 8.4 server, install the new pg_dump, and run the test again to see if pg_largeobject is fixed? This patch properly sets the relfrozenxid in the system tables for each database. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Mon, Jul 13, 2009 at 8:03 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote: > Alvaro Herrera wrote: > > Bruce Momjian wrote: > > > Alvaro Herrera wrote: > > > > Bruce Momjian wrote: > > > > > Jamie Fox wrote: > > > > > > > > > > I can also see that the pg_largeobject table is different, in the > pg_restore > > > > > > version the Rows (estimated) is 316286 and Rows (counted) is the > same, in > > > > > > the pg_migrator version the Rows (counted) is only 180507. > > > > > > > > > Wow, I didn't test large objects specifically, and I am confused > why > > > > > there would be a count discrepancy. I will need to do some research > > > > > unless someone else can guess about the cause. > > > > > > > > Maybe pg_largeobject is not getting frozen? > > > > > > That would explain the change in count, but I thought we froze > > > _everything_, and had to. > > > > After a quick chat with Bruce it was determined that we don't freeze > > anything (it would be horrid for downtime if we did so in pg_migrator; > > and it would be useless if ran anywhere else). What we do is migrate > > pg_clog from the old cluster to the new. So never mind that hypothesis. > > > > Bruce noticed that the pg_dump/pg_migrator combo is failing to restore > > pg_largeobject's relfrozenxid. We're not sure how this is causing the > > errors Jamie is seeing, because what I think should happen is that scans > > of the table should fail with failures to open pg_clog files > > such-and-such, but not missing tuples ... > > Jamie, is it possible for you to apply the attached patch to the 8.4 > server, install the new pg_dump, and run the test again to see if > pg_largeobject is fixed? This patch properly sets the relfrozenxid in > the system tables for each database. > Sorry for the confusion, an addendum meant to be helpful fell out of this thread during the move from -general. I will try this patch now, but to be clear - the number of rows in pg_largeobject changed after I ran vacuumlo (that eventually failed). Here's what I have found that got broken during pg_migrate: In two side by side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the pg_largeobject table has the same number of rows. However, in the 8.4 database any select for an loid in pg_largeobject returns zero rows. If I select all loids to a file, and compare to select all loids from 8.3.7 they're the same. When I select != an loid it seems to exclude the one and return the rest, but all other comparisons <, > or = return zero rows. Or I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in other tables fails in the 8.4 database with 'large object xxxxid does not exist'. Thanks again, Jamie
Jamie Fox wrote: > Here's what I have found that got broken during pg_migrate: In two side by > side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the > pg_largeobject table has the same number of rows. However, in the 8.4 > database any select for an loid in pg_largeobject returns zero rows. If I > select all loids to a file, and compare to select all loids from 8.3.7 > they're the same. When I select != an loid it seems to exclude the one and > return the rest, but all other comparisons <, > or = return zero rows. Or > I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in > other tables fails in the 8.4 database with 'large object xxxxid does not > exist'. Oh, so maybe it's pg_largeobject's index that's borked ... Did you try reindexing it? How are we transferring pg_largeobject, and are we transferring its index too? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> > Here's what I have found that got broken during pg_migrate: In two side > by > > side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the > > pg_largeobject table has the same number of rows. However, in the 8.4 > > database any select for an loid in pg_largeobject returns zero rows. If > I > > select all loids to a file, and compare to select all loids from 8.3.7 > > they're the same. When I select != an loid it seems to exclude the one > and > > return the rest, but all other comparisons <, > or = return zero rows. > Or > > I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in > > other tables fails in the 8.4 database with 'large object xxxxid does not > > exist'. > > Oh, so maybe it's pg_largeobject's index that's borked ... Did you try > reindexing it? > > How are we transferring pg_largeobject, and are we transferring its > index too? Hi - REINDEX INDEX pg_largeobject_loid_pn_index; This seems to have fixed the problem, lo_open of lob data is working again - now to see how vacuumlo likes it. Thanks, Jamie
Jamie Fox wrote: > Hi - > REINDEX INDEX pg_largeobject_loid_pn_index; > > This seems to have fixed the problem, lo_open of lob data is working again - > now to see how vacuumlo likes it. So did it work? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Worked great, vacuumlo finished, a vacuum -full finished amazingly quickly, very exciting. We're pointing qa apps at it now for testing. For some reason though, that index has to be rebuilt after running pg_migrator. I'll be testing on our 100GB+ prod copy shortly and will let you know if you want. Thanks, Jamie On Wed, Jul 15, 2009 at 9:28 AM, Alvaro Herrera <alvherre(at)commandprompt(dot)com>wrote: > Jamie Fox wrote: > > > Hi - > > REINDEX INDEX pg_largeobject_loid_pn_index; > > > > This seems to have fixed the problem, lo_open of lob data is working > again - > > now to see how vacuumlo likes it. > > So did it work? > > -- > Alvaro Herrera > http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc. >
To more clearly identify that pg_migrator now has known bugs, I have released pg_migrator 8.4.1 alpha1, and mentioned in the README that there are known bugs related to migrating sequences and large objects. I have removed the 8.4 source file from pgfoundry. --------------------------------------------------------------------------- Alvaro Herrera wrote: > Jamie Fox wrote: > > > Hi - > > REINDEX INDEX pg_largeobject_loid_pn_index; > > > > This seems to have fixed the problem, lo_open of lob data is working again - > > now to see how vacuumlo likes it. > > So did it work? > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > After a quick chat with Bruce it was determined that we don't freeze > > anything (it would be horrid for downtime if we did so in pg_migrator; > > and it would be useless if ran anywhere else). What we do is migrate > > pg_clog from the old cluster to the new. So never mind that hypothesis. > > > > Bruce noticed that the pg_dump/pg_migrator combo is failing to restore > > pg_largeobject's relfrozenxid. We're not sure how this is causing the > > errors Jamie is seeing, because what I think should happen is that scans > > of the table should fail with failures to open pg_clog files > > such-and-such, but not missing tuples ... > > Jamie, is it possible for you to apply the attached patch to the 8.4 > server, install the new pg_dump, and run the test again to see if > pg_largeobject is fixed? This patch properly sets the relfrozenxid in > the system tables for each database. I have applied the attached patch to have pg_dump restore pg_largeobject.relfrozenxid in binary upgrade mode; backpatched to 8.4.X. This doesn't fix the reported problem, but it is still a bug. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Jamie Fox wrote: > > > Here's what I have found that got broken during pg_migrate: In two side > > by > > > side databases (an 8.3.7 copy and 8.4.0 migrated with pg_migrator) the > > > pg_largeobject table has the same number of rows. However, in the 8.4 > > > database any select for an loid in pg_largeobject returns zero rows. If > > I > > > select all loids to a file, and compare to select all loids from 8.3.7 > > > they're the same. When I select != an loid it seems to exclude the one > > and > > > return the rest, but all other comparisons <, > or = return zero rows. > > Or > > > I'm completely batty. Dereferencing via lo_open of blob_data (an oid) in > > > other tables fails in the 8.4 database with 'large object xxxxid does not > > > exist'. > > > > Oh, so maybe it's pg_largeobject's index that's borked ... Did you try > > reindexing it? > > > > How are we transferring pg_largeobject, and are we transferring its > > index too? > > > Hi - > REINDEX INDEX pg_largeobject_loid_pn_index; > > This seems to have fixed the problem, lo_open of lob data is working again - > now to see how vacuumlo likes it. I have applied the attached patch to pg_migrator to properly migrate the pg_largeobject index. I have added large object comment migration as a TODO item. This eliminates the last known bug in pg_migrator. -- Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +