Re: pg crashing

From: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>, "Magnus Hagander" <magnus(at)hagander(dot)net>
Subject: Re: pg crashing
Date: 2008-07-02 12:48:56
Message-ID: 1A6E6D554222284AB25ABE3229A92762E9A688@nrtexcus702.int.asurion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

> "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com> writes:
> > Version: "PostgreSQL 8.3.0, compiled by Visual C++ build 1400"
>
> Well, there are plenty of known bugs in 8.3.0 by now. You really
> should update before complaining, not after.

I'm not complaining. I just want to make sure that if I upgrade, it
will fix the problem. An upgrade could possible introduce a new
problem. I also wonder if this is isolated to Win32 because we are
upgrading to Solaris very soon.

>
> > Problem: My database keeps on crashing every few days with this type
of
> > error message:
>
> > 2008-07-01 10:46:30 CDT LOG: all server processes terminated;
> > reinitializing
>
> I think your real problem is with what happened *before* that.

I found the first instance of "crash" and then got the rest of the log
file.

2008-07-01 10:43:42 CDT LOG: server process (PID 3524) exited with exit
code 128
2008-07-01 10:43:42 CDT LOG: terminating any other active server
processes
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT: SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
PL/pgSQL function "fn_update_status" line 136 at PERFORM
SQL statement "SELECT gp_load.fn_update_status( $1 , $2 ,
'Processing', '', 0)"
PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT: PL/pgSQL function "fn_get_job_details"
line 114 at IF
PL/pgSQL function "fn_load" line 465 at FOR over SELECT rows
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT: SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
PL/pgSQL function "fn_update_status" line 136 at PERFORM
SQL statement "SELECT gp_load.fn_update_status( $1 , $2 ,
'Processing', '', 0)"
PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT: SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
PL/pgSQL function "fn_update_status" line 136 at PERFORM
SQL statement "SELECT gp_load.fn_update_status( $1 , $2 ,
'Processing', '', 0)"
PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT: SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
PL/pgSQL function "fn_update_status" line 136 at PERFORM
SQL statement "SELECT gp_load.fn_update_status( $1 , $2 ,
'Processing', '', 0)"
PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT CONTEXT: SQL statement "SELECT
pg_sleep(cast(control.fn_get_variable('sleep_time') as int))"
PL/pgSQL function "fn_update_status" line 136 at PERFORM
SQL statement "SELECT gp_load.fn_update_status( $1 , $2 ,
'Processing', '', 0)"
PL/pgSQL function "fn_load" line 56 at PERFORM
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:42 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:42 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:42 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
INFO: 2008-07-01 10:44:42.373008-05:3000:Executing Query ..... create
temporary table order_iud on commit drop as select
s."jid",s."order_id",s."parent_order_id",s."claim_id",s."fulfillment_met
hod",s."order_status_id",s."payment_status",s."created_by",s."created_dt
m",s."modified_by",s."modified_dtm",s."deleted_flg",s."claimed_item_id",
s."order_state_hashcode",s."autopaymentauth_flg",s."salvageexpectation_s
tatus",s."snr_fee",s."salvageexpectation_reason",s."salvageexpectation_n
ote",s."nocharge_override",s."change_datetime",s."change_type",s."edw_pr
ocess_flag",s."edw_process_ts" from stage_intelliset.order s, (select
max("order".jid) as jid, "order".order_id from stage_intelliset."order"
group by "order".order_id) m where m.jid = s.jid and m.order_id::text =
s.order_id::text distributed by (order_id)
INFO: 2008-07-01 10:44:43.185777-05:3000:Execution Complete.
INFO: 2008-07-01 10:44:46.988228-05:5000:Executing Update ..... update
replica_intelliset.invoiceautopayment set "order_id" = y."order_id",
"invoice_id" = y."invoice_id", "transaction_id" = y."transaction_id",
"paymenttype_code" = y."paymenttype_code", "amount" = y."amount",
"status_code" = y."status_code", "deleted_flg" = y."deleted_flg",
"created_by" = y."created_by", "created_dtm" = y."created_dtm",
"modified_by" = y."modified_by", "modified_dtm" = y."modified_dtm",
edw_modified_dt = y.edw_process_ts, edw_modified_id = y.jid,
edw_clock_ts = clock_timestamp() from (select
a."invoiceautopayement_id", a."order_id", a."invoice_id",
a."transaction_id", a."paymenttype_code", a."amount", a."status_code",
a."deleted_flg", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm", a.edw_process_ts, a.jid, a.change_type from
invoiceautopayment_iud a inner join
replica_intelliset.invoiceautopayment b on a.invoiceautopayement_id =
b.invoiceautopayement_id) y where
replica_intelliset.invoiceautopayment.invoiceautopayement_id =
y.invoiceautopayement_id and y.change_type <> 3
INFO: 2008-07-01 10:44:48.785215-05:5000:Update Complete.
INFO: 2008-07-01 10:44:48.786175-05:6000:Executing Insert ..... insert
into replica_intelliset.invoiceautopayment ("invoiceautopayement_id",
"order_id", "invoice_id", "transaction_id", "paymenttype_code",
"amount", "status_code", "deleted_flg", "created_by", "created_dtm",
"modified_by", "modified_dtm", edw_created_dt, edw_modified_dt,
edw_created_id, edw_modified_id, edw_clock_ts) select
a."invoiceautopayement_id", a."order_id", a."invoice_id",
a."transaction_id", a."paymenttype_code", a."amount", a."status_code",
a."deleted_flg", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm", edw_process_ts as edw_created_dt, null as
edw_modified_dt, a.jid as edw_created_id, null as edw_modified_id,
clock_timestamp() as edw_clock_ts from invoiceautopayment_iud a left
outer join replica_intelliset.invoiceautopayment b on
a.invoiceautopayement_id = b.invoiceautopayement_id where
(b.invoiceautopayement_id is null) and a.change_type <> 3
INFO: 2008-07-01 10:44:50.738058-05:6000:Insert Complete.
INFO: 2008-07-01 10:44:50.750247-05:8000:Executing insert into archive
table..... insert into stage_intelliset.invoiceautopayment_arch select *
from stage_intelliset.invoiceautopayment
INFO: 2008-07-01 10:44:50.839105-05:8000:Insert complete
2008-07-01 10:43:55 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:55 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:55 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:55 CDT CONTEXT: SQL statement "select * from
dblink('gp', $1 ) as t1 (return_text varchar)"
PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
SQL statement "SELECT gp_execute.fn_remote_sql( $1 , $2 )"
PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:55 CDT LOG: could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:43:55 CDT CONTEXT: SQL statement "select * from
dblink('gp', $1 ) as t1 (return_text varchar)"
PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
SQL statement "SELECT gp_execute.fn_remote_sql( $1 , $2 )"
PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:55 CDT STATEMENT: select * from gp_load.fn_load(809)
INFO: 2008-07-01 10:44:53.103065-05:5000:Executing Update ..... update
replica_intelliset.order set "parent_order_id" = y."parent_order_id",
"claim_id" = y."claim_id", "fulfillment_method" =
y."fulfillment_method", "order_status_id" = y."order_status_id",
"payment_status" = y."payment_status", "created_by" = y."created_by",
"created_dtm" = y."created_dtm", "modified_by" = y."modified_by",
"modified_dtm" = y."modified_dtm", "deleted_flg" = y."deleted_flg",
"claimed_item_id" = y."claimed_item_id", "order_state_hashcode" =
y."order_state_hashcode", "autopaymentauth_flg" =
y."autopaymentauth_flg", "salvageexpectation_status" =
y."salvageexpectation_status", "snr_fee" = y."snr_fee",
"salvageexpectation_reason" = y."salvageexpectation_reason",
"salvageexpectation_note" = y."salvageexpectation_note",
"nocharge_override" = y."nocharge_override", edw_modified_dt =
y.edw_process_ts, edw_modified_id = y.jid, edw_clock_ts =
clock_timestamp() from (select a."order_id", a."parent_order_id",
a."claim_id", a."fulfillment_method", a."order_status_id",
a."payment_status", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm", a."deleted_flg", a."claimed_item_id",
a."order_state_hashcode", a."autopaymentauth_flg",
a."salvageexpectation_status", a."snr_fee",
a."salvageexpectation_reason", a."salvageexpectation_note",
a."nocharge_override", a.edw_process_ts, a.jid, a.change_type from
order_iud a inner join replica_intelliset.order b on a.order_id =
b.order_id) y where replica_intelliset.order.order_id = y.order_id and
y.change_type <> 3
INFO: 2008-07-01 10:44:54.536366-05:5000:Update Complete.
INFO: 2008-07-01 10:44:54.537312-05:6000:Executing Insert ..... insert
into replica_intelliset.order ("order_id", "parent_order_id",
"claim_id", "fulfillment_method", "order_status_id", "payment_status",
"created_by", "created_dtm", "modified_by", "modified_dtm",
"deleted_flg", "claimed_item_id", "order_state_hashcode",
"autopaymentauth_flg", "salvageexpectation_status", "snr_fee",
"salvageexpectation_reason", "salvageexpectation_note",
"nocharge_override", edw_created_dt, edw_modified_dt, edw_created_id,
edw_modified_id, edw_clock_ts) select a."order_id", a."parent_order_id",
a."claim_id", a."fulfillment_method", a."order_status_id",
a."payment_status", a."created_by", a."created_dtm", a."modified_by",
a."modified_dtm", a."deleted_flg", a."claimed_item_id",
a."order_state_hashcode", a."autopaymentauth_flg",
a."salvageexpectation_status", a."snr_fee",
a."salvageexpectation_reason", a."salvageexpectation_note",
a."nocharge_override", edw_process_ts as edw_created_dt, null as
edw_modified_dt, a.jid as edw_created_id, null as edw_modified_id,
clock_timestamp() as edw_clock_ts from order_iud a left outer join
replica_intelliset.order b on a.order_id = b.order_id where (b.order_id
is null) and a.change_type <> 3
INFO: 2008-07-01 10:44:54.577749-05:6000:Insert Complete.
INFO: 2008-07-01 10:44:54.583708-05:8000:Executing insert into archive
table..... insert into stage_intelliset.order_arch select * from
stage_intelliset.order
INFO: 2008-07-01 10:44:54.715591-05:8000:Insert complete
2008-07-01 10:43:58 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:43:58 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:43:58 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:43:58 CDT CONTEXT: SQL statement "select * from
dblink('gp', $1 ) as t1 (return_text varchar)"
PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
SQL statement "SELECT gp_execute.fn_remote_sql( $1 , $2 )"
PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:58 CDT LOG: could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:43:58 CDT CONTEXT: SQL statement "select * from
dblink('gp', $1 ) as t1 (return_text varchar)"
PL/pgSQL function "fn_remote_sql" line 63 at SQL statement
SQL statement "SELECT gp_execute.fn_remote_sql( $1 , $2 )"
PL/pgSQL function "fn_load" line 393 at PERFORM
2008-07-01 10:43:58 CDT STATEMENT: select * from gp_load.fn_load(813)
2008-07-01 10:46:30 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:46:30 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:46:30 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:46:30 CDT CONTEXT: PL/pgSQL function "fn_load" line 271
at IF
2008-07-01 10:46:30 CDT LOG: could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:46:30 CDT CONTEXT: PL/pgSQL function "fn_load" line 271
at IF
2008-07-01 10:46:30 CDT STATEMENT: select * from gp_load.fn_load(757)
2008-07-01 10:46:30 CDT WARNING: terminating connection because of
crash of another server process
2008-07-01 10:46:30 CDT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2008-07-01 10:46:30 CDT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2008-07-01 10:46:30 CDT CONTEXT: SQL statement "select
remote_execute.fn_get_max( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8
, $9 , $10 , $11 )"
PL/pgSQL function "fn_load" line 265 at SQL statement
2008-07-01 10:46:30 CDT LOG: could not send data to client: No
connection could be made because the target machine actively refused it.


2008-07-01 10:46:30 CDT CONTEXT: SQL statement "select
remote_execute.fn_get_max( $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8
, $9 , $10 , $11 )"
PL/pgSQL function "fn_load" line 265 at SQL statement
2008-07-01 10:46:30 CDT STATEMENT: select * from gp_load.fn_load(965)
2008-07-01 10:46:30 CDT LOG: all server processes terminated;
reinitializing
2008-07-01 10:46:31 CDT FATAL: pre-existing shared memory block is
still in use
2008-07-01 10:46:31 CDT HINT: Check if there are any old server
processes still running, and terminate them.

> But:
>
> > 2008-07-01 10:46:31 CDT FATAL: pre-existing shared memory block is
> > still in use
> > 2008-07-01 10:46:31 CDT HINT: Check if there are any old server
> > processes still running, and terminate them.
>
> Hmm ... the code in win32_shmem.c that generates this message seems
> mighty bogus to me --- it's just hoping that one-second delay is
> enough. Another problem is that postmaster children that do
> PGSharedMemoryDetach will still have valid inherited handles for
> the shmem segment --- does that factor into the behavior? It looks
> to me like the CloseHandle ought to be in PGSharedMemoryDetach.
>
> regards, tom lane

This wouldn't be a problem in Unix?

Jon

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Roberts, Jon 2008-07-02 12:52:42 Re: pg crashing
Previous Message Roberts, Jon 2008-07-02 12:35:08 Re: simple tool for building web forms

Browse pgsql-hackers by date

  From Date Subject
Next Message Roberts, Jon 2008-07-02 12:52:42 Re: pg crashing
Previous Message Gregory Stark 2008-07-02 12:39:02 Re: Please claim review items for commit fest!