Re: [HACKERS] memory problems in copying large table to STDOUT

From: Martin Weinberg <weinberg(at)osprey(dot)astro(dot)umass(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martin Weinberg <weinberg(at)osprey(dot)astro(dot)umass(dot)edu>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] memory problems in copying large table to STDOUT
Date: 1999-10-10 13:38:43
Message-ID: 199910101438.KAA15476@osprey.astro.umass.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

I am attaching the backtrace. This one simultaneously generated
this kernel message from the md driver:

raid0_map bug: hash->zone0==NULL for block 1132810879
Bad md_map in ll_rw_block

Definitely a problem but no longer sure if it's the same one . . .
sigh.

I inserted a pause() in the beginning of elog.c so that I
could attach remotely; I'm keeping the sleeping processing
around in case there's anything else you would like me to
check.

Guess, I'm (foolishly?) pushing the envelope here with a 100GB
database on software raid.

Thanks!!!

--Martin

P.S. After the fact, I realized that my source is Oliver Elphick's
Debian 6.5.1 source package rather than a pure vanilla source.
Hope this is not a problem . . .

----------------------------------------------------------------------

(gdb) bt
#0 0x4012eb77 in pause ()
#1 0x81160e9 in elog (lev=-1, fmt=0x8146a4b "cannot read block %d of %s")
at elog.c:81
#2 0x80e76ef in smgrread (which=0, reln=0x822af60, blocknum=2638753,
buffer=0x44a85040 "h") at smgr.c:235
#3 0x80dd7a2 in ReadBufferWithBufferLock (reln=0x822af60, blockNum=2638753,
bufferLockHeld=0) at bufmgr.c:302
#4 0x80dd682 in ReadBuffer (reln=0x822af60, blockNum=2638753) at bufmgr.c:180
#5 0x80ddf1d in ReleaseAndReadBuffer (buffer=9175, relation=0x822af60,
blockNum=2638753) at bufmgr.c:954
#6 0x806ad13 in heapgettup (relation=0x822af60, tuple=0x8235374, dir=1,
buffer=0x8235398, snapshot=0x8232af0, nkeys=0, key=0x0) at heapam.c:469
#7 0x806b6bf in heap_getnext (scandesc=0x8235360, backw=0) at heapam.c:912
#8 0x8084eb3 in CopyTo (rel=0x822af60, binary=0 '\000', oids=0 '\000',
fp=0x0, delim=0x813c829 "\t") at copy.c:405
#9 0x8084ce4 in DoCopy (relname=0x82350c0 "psc", binary=0 '\000',
oids=0 '\000', from=0 '\000', pipe=1 '\001', filename=0x0,
delim=0x813c829 "\t") at copy.c:323
#10 0x80ea8c6 in ProcessUtility (parsetree=0x82350d8, dest=Remote)
at utility.c:227
#11 0x80e8a36 in pg_exec_query_dest (
query_string=0xbfffaef4 "copy psc to STDOUT;", dest=Remote, aclOverride=0)
at postgres.c:727
#12 0x80e8944 in pg_exec_query (query_string=0xbfffaef4 "copy psc to STDOUT;")
at postgres.c:656
#13 0x80e9b88 in PostgresMain (argc=11, argv=0xbffff46c, real_argc=12,
real_argv=0xbffff984) at postgres.c:1647
#14 0x80d1adc in DoBackend (port=0x81ef748) at postmaster.c:1628
#15 0x80d1613 in BackendStartup (port=0x81ef748) at postmaster.c:1373
#16 0x80d0ca6 in ServerLoop () at postmaster.c:823
#17 0x80d080c in PostmasterMain (argc=12, argv=0xbffff984) at postmaster.c:616
#18 0x80a4597 in main (argc=12, argv=0xbffff984) at main.c:93

Tom Lane wrote on Sat, 09 Oct 1999 14:42:56 EDT
>Martin Weinberg <weinberg(at)osprey(dot)astro(dot)umass(dot)edu> writes:
>> Tom Lane wrote on Sat, 09 Oct 1999 11:54:34 EDT
>>>> FATAL 1: Memory exhausted in AllocSetAlloc()
>>>
>>> Hmm. What is the exact declaration of the table?
>>>
>>> The only explanation I can think of offhand is that the output
>>> conversion function for one of the column types is leaking memory...
>>> copy.c itself looks to be pretty careful not to.
>
>> The table def is:
>
>> CREATE TABLE psc (
>> hemis text,
>> date date,
>> scan int2,
>> id int4,
>> ra float4,
>> [ lots more fields of these same types ]
>
>Hmm, nothing unusual there. I made up a dummy table containing these
>column types, filled it with 16 meg of junk data, and copied in and
>out without observing any process memory usage growth at all, under
>both current sources and 6.5.2. I also used gdb to set a breakpoint
>at AllocSetAlloc, and checked that the inner loop of the copy wasn't
>allocating anything it didn't free. So there's no obvious memory
>leakage bug here. (It'd be pretty surprising if there was, really,
>for such commonly used data types.)
>
>I'm now thinking that there must be either a problem specific to your
>platform, or some heretofore unnoticed problem with copying from a
>multi-segment (ie, multi-gigabyte) table. I don't have enough disk
>space to check the latter theory here...
>
>Can you prepare a debugger backtrace showing what the backend is doing
>when it gets the error? If you're not familiar with gdb, it'd go
>something like this:
>
>1. Build & install postgres with debugging symbols enabled
> ("make CUSTOM_COPT=-g all").
>2. Start gdb on the postgres executable, eg
> "gdb /usr/local/pgsql/bin/postgres".
>3. Fire up the copy-out operation as usual. (I assume this takes long
> enough that you have plenty of time for the next two steps ;-))
>4. Use ps or top to find out the process number of the backend handling
> the session.
>5. Attach to that process number in gdb:
> (gdb) attach NNNN
>6. Set a breakpoint at elog, and let the backend continue running:
> (gdb) break elog
> (gdb) continue
>7. When the breakpoint is hit, get a backtrace:
> Breakpoint 1, elog ...
> (gdb) bt
> After copying & pasting the resulting printout, you can "quit" to
> get out of gdb.
>
> regards, tom lane
>
>************
>

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-10-10 15:59:53 Re: [HACKERS] memory problems in copying large table to STDOUT
Previous Message Jeroen van Vianen 1999-10-10 12:35:01 pgxml 1.0 released