Transaction ID overrun problem on greenplum

From: "Little, Douglas" <DOUGLAS(dot)LITTLE(at)orbitz(dot)com>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Transaction ID overrun problem on greenplum
Date: 2012-05-18 16:41:02
Message-ID: 8585BA53443004458E0BAA6134C5A7FB9E549225@EGEXCMB01.oww.root.lcl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello,
Apologies in advance for asking here and not on the greenplum forum, but I'm asking here because I think the PG community understands pg better than Greenplum.

We run Greenplum 4.1 which is based on PG 8.2.15
We recently had a problem where the transaction ID overflowed an internal buffer, causing queries to hang and randomly fail.
Greenplum provided a fix where they increased the size of the buffer.
Greenplum has disabled the Autovacuum features. I suspect because they have no method of coordinating the autovacuums across the nodes.

I've reviewed the PG doc on transaction id and
I'm trying to understand if there's something we are doing wrong.
Practices concerning me

1. We never vacuum the entire db - we're doing the system catalog tables 3x/day and user tables at least 1x/w- although we don't have a method of identifying tables that have not been vacuumed so it's possible we're missing some table.

2. In the pg_clog directories we have 25 files dating back to aug 2011. When the instances start, the logs are filled with twophase location message. Are these expected?

3. Is the overflow really being caused by transaction wrap around?

Would someone comment on these concerns.
Thanks

Guk setting
vacuum_freeze_min_age 100,000,000 Minimum age at which VACUUM should freeze a table row.

Log failure message

2012-05-10 16:26:24.425750 CDT,"cognos_cube","p1gp1",p25497,th2121975200,"172.28.8.250","18959",2012-05-10 16:25:03 CDT,100000245,con2170,,seg24,,,x100000245,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 99999948",,,,,,,0,,"twophase.c",1368,
2012-05-10 16:26:24.739536 CDT,"infa_mktldr","p1gp1",p27945,th2121975200,"172.28.8.250","27576",2012-05-10 16:26:15 CDT,100000246,con2240,,seg24,,,x100000246,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 100000230",,,,,,,0,,"twophase.c",1368,
2012-05-10 16:26:33.859255 CDT,"infa_read","p1gp1",p27981,th2121975200,"172.28.8.250","27866",2012-05-10 16:26:18 CDT,100000256,con2244,,seg24,,,x100000256,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 100000236",,,,,,,0,,"twophase.c",1368,
2012-05-10 16:26:38.564872 CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10 16:26:06 CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"LOG","00000","
Unexpected internal error: Segment process 27416 received signal SIGSEGV

",,,,,,,0,,,,
2012-05-10 16:26:41.359615 CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10 16:26:06 CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV (postgres.c:3384)",,,,,,,0,,"postgres.c",3384,"Stack trace:
1 0xa39555 postgres errstart (elog.c:454)
2 0x8e26e5 postgres <symbol not found> (postgres.c:3380)
3 0x38c70302d0 libc.so.6 <symbol not found> (??:0)
4 0x8b1578 postgres ProcArrayAdd (procarray.c:149)
5 0x5258a5 postgres EndPrepare (twophase.c:1240)
6 0x4f3462 postgres <symbol not found> (xact.c:3094)
7 0x4f3a75 postgres CommitTransactionCommand (xact.c:3523)
8 0xb944a5 postgres performDtxProtocolCommand (cdbtm.c:3891)
9 0x8e6552 postgres PostgresMain (postgres.c:1414)
10 0x8516b1 postgres <symbol not found> (postmaster.c:6443)
11 0x8589a5 postgres PostmasterMain (postmaster.c:2272)
12 0x76713a postgres main (main.c:212)
13 0x38c701d994 libc.so.6 __libc_start_main (??:0)
14 0x475089 postgres <symbol not found> (??:0)
"
2012-05-10 16:26:46.580196 CDT,,,p25872,th2121975200,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 27416) was terminated by signal 6: Aborted",,,,,,,0,,"postmaster.c",5275,

Log messages regarding twophase
2012-05-18 00:05:49.849381 CDT,"infa_write","p1gp1",p15767,th-1760766560,"172.28.8.250","33445",2012-05-18 00:05:29 CDT,104342361,con385826,,seg0,,,x104342361,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342242",,,,,,,0,,"twophase.c",1368,
2012-05-18 00:05:50.018299 CDT,"infa_write","p1gp1",p15887,th-1760766560,"172.28.8.250","34675",2012-05-18 00:05:32 CDT,104342362,con385833,,seg0,,,x104342362,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342257",,,,,,,0,,"twophase.c",1368,
2012-05-18 00:05:50.029302 CDT,"infa_write","p1gp1",p15839,th-1760766560,"172.28.8.250","34335",2012-05-18 00:05:30 CDT,104342363,con385830,,seg0,,,x104342363,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342250",,,,,,,0,,"twophase.c",1368,
2012-05-18 00:05:52.274591 CDT,"infa_write","p1gp1",p15959,th-1760766560,"172.28.8.250","35303",2012-05-18 00:05:36 CDT,104342369,con385840,,seg0,,,x104342369,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342277",,,,,,,0,,"twophase.c",1368,
2012-05-18 00:05:52.277127 CDT,"infa_write","p1gp1",p15478,th-1760766560,"172.28.8.250","31184",2012-05-18 00:05:24 CDT,104342370,con385813,,seg0,,,x104342370,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342346",,,,,,,0,,"twophase.c",1368,
2012-05-18 00:05:52.280285 CDT,"infa_write","p1gp1",p15535,th-1760766560,"172.28.8.250","31304",2012-05-18 00:05:25 CDT,104342371,con385816,,seg0,,,x104342371,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342347",,,,,,,0,,"twophase.c",1368,

Thanks
Doug Little

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2012-05-18 17:01:06 Re: Transaction ID overrun problem on greenplum
Previous Message Fabrízio de Royes Mello 2012-05-17 16:08:01 Re: Run external program from postgresql