Re: Transaction ID overrun problem on greenplum

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: "Little, Douglas" <DOUGLAS(dot)LITTLE(at)orbitz(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Transaction ID overrun problem on greenplum
Date: 2012-05-18 17:01:06
Message-ID: CAOR=d=3p6LzDbd=XDdow6YvOz1mmUis9wm+b55Qtcn+pqAOgAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Can you run db wide vacuums? That's what I'd try first.

On Fri, May 18, 2012 at 10:41 AM, Little, Douglas
<DOUGLAS(dot)LITTLE(at)orbitz(dot)com> wrote:
> Hello,
>
> Apologies in advance for asking here and not on the greenplum forum, but
> I’m asking here because I think the PG community understands pg better than
> Greenplum.
>
>
>
> We run Greenplum 4.1 which is based on PG 8.2.15
>
> We recently had a problem where the transaction ID overflowed an internal
> buffer,  causing queries to  hang and randomly fail.
>
> Greenplum provided a fix where they increased the size of the buffer.
>
> Greenplum has disabled the Autovacuum features. I suspect because they have
> no method of coordinating the autovacuums across the nodes.
>
>
>
> I’ve reviewed the PG doc on transaction id and
>
> I’m trying to understand if there’s something we are doing wrong.
>
> Practices concerning me
>
> 1.        We never vacuum the entire db – we’re doing the system catalog
> tables 3x/day and user tables at least 1x/w- although we don’t have a method
> of identifying tables that have not been vacuumed so it’s possible we’re
> missing some table.
>
> 2.       In the pg_clog directories we have 25 files dating back to aug
> 2011.  When the instances start, the logs are filled with twophase location
> message.  Are these expected?
>
> 3.       Is the overflow really being caused by transaction wrap around?
>
>
>
> Would someone comment on these concerns.
>
> Thanks
>
>
>
>
>
> Guk setting
>
> vacuum_freeze_min_age            100,000,000        Minimum age at which
> VACUUM should freeze a table row.
>
>
>
>
>
> Log failure message
>
>
>
> 2012-05-10 16:26:24.425750
> CDT,"cognos_cube","p1gp1",p25497,th2121975200,"172.28.8.250","18959",2012-05-10
> 16:25:03
> CDT,100000245,con2170,,seg24,,,x100000245,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 99999948",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-10 16:26:24.739536
> CDT,"infa_mktldr","p1gp1",p27945,th2121975200,"172.28.8.250","27576",2012-05-10
> 16:26:15
> CDT,100000246,con2240,,seg24,,,x100000246,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 100000230",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-10 16:26:33.859255
> CDT,"infa_read","p1gp1",p27981,th2121975200,"172.28.8.250","27866",2012-05-10
> 16:26:18
> CDT,100000256,con2244,,seg24,,,x100000256,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 100000236",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-10 16:26:38.564872
> CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10
> 16:26:06
> CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"LOG","00000","
>
> Unexpected internal error: Segment process 27416 received signal SIGSEGV
>
>
>
> ",,,,,,,0,,,,
>
> 2012-05-10 16:26:41.359615
> CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10
> 16:26:06
> CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"PANIC","XX000","Unexpected
> internal error: Segment process received signal SIGSEGV
> (postgres.c:3384)",,,,,,,0,,"postgres.c",3384,"Stack trace:
>
> 1    0xa39555 postgres errstart (elog.c:454)
>
> 2    0x8e26e5 postgres <symbol not found> (postgres.c:3380)
>
> 3    0x38c70302d0 libc.so.6 <symbol not found> (??:0)
>
> 4    0x8b1578 postgres ProcArrayAdd (procarray.c:149)
>
> 5    0x5258a5 postgres EndPrepare (twophase.c:1240)
>
> 6    0x4f3462 postgres <symbol not found> (xact.c:3094)
>
> 7    0x4f3a75 postgres CommitTransactionCommand (xact.c:3523)
>
> 8    0xb944a5 postgres performDtxProtocolCommand (cdbtm.c:3891)
>
> 9    0x8e6552 postgres PostgresMain (postgres.c:1414)
>
> 10   0x8516b1 postgres <symbol not found> (postmaster.c:6443)
>
> 11   0x8589a5 postgres PostmasterMain (postmaster.c:2272)
>
> 12   0x76713a postgres main (main.c:212)
>
> 13   0x38c701d994 libc.so.6 __libc_start_main (??:0)
>
> 14   0x475089 postgres <symbol not found> (??:0)
>
> "
>
> 2012-05-10 16:26:46.580196
> CDT,,,p25872,th2121975200,,,,0,,,seg-1,,,,,"LOG","00000","server process
> (PID 27416) was terminated by signal 6:
> Aborted",,,,,,,0,,"postmaster.c",5275,
>
>
>
> Log messages regarding twophase
>
> 2012-05-18 00:05:49.849381
> CDT,"infa_write","p1gp1",p15767,th-1760766560,"172.28.8.250","33445",2012-05-18
> 00:05:29
> CDT,104342361,con385826,,seg0,,,x104342361,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342242",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:50.018299
> CDT,"infa_write","p1gp1",p15887,th-1760766560,"172.28.8.250","34675",2012-05-18
> 00:05:32
> CDT,104342362,con385833,,seg0,,,x104342362,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342257",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:50.029302
> CDT,"infa_write","p1gp1",p15839,th-1760766560,"172.28.8.250","34335",2012-05-18
> 00:05:30
> CDT,104342363,con385830,,seg0,,,x104342363,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342250",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:52.274591
> CDT,"infa_write","p1gp1",p15959,th-1760766560,"172.28.8.250","35303",2012-05-18
> 00:05:36
> CDT,104342369,con385840,,seg0,,,x104342369,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342277",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:52.277127
> CDT,"infa_write","p1gp1",p15478,th-1760766560,"172.28.8.250","31184",2012-05-18
> 00:05:24
> CDT,104342370,con385813,,seg0,,,x104342370,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342346",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:52.280285
> CDT,"infa_write","p1gp1",p15535,th-1760766560,"172.28.8.250","31304",2012-05-18
> 00:05:25
> CDT,104342371,con385816,,seg0,,,x104342371,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342347",,,,,,,0,,"twophase.c",1368,
>
>
>
> Thanks
>
> Doug Little
>
>
>
>

--
To understand recursion, one must first understand recursion.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Robert Bruccoleri 2012-05-18 21:31:49 Problem with SSL certificate setup
Previous Message Little, Douglas 2012-05-18 16:41:02 Transaction ID overrun problem on greenplum