Re: Performance problem with large insert statements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Barry Lind <barry(at)xythos(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance problem with large insert statements
Date: 2001-12-04 03:49:11
Message-ID: 25924.1007437751@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Barry Lind <barry(at)xythos(dot)com> writes:
> In looking at some performance issues (I was trying to look at the
> overhead of toast) I found that large insert statements were very slow.
> ...
> I tried to run gprof to see where the time was going, but for some
> reason the gprof output on my gmon.out file doesn't have any timing
> information (all times are reported as 0.0) and I haven't been able to
> figure out why yet.

That seems to be a common disease among Linuxen; dunno why. gprof
works fine for me on HPUX. I got around to reproducing this today,
and what I find is that the majority of the backend time is going into
simple scanning of the input statement:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
31.24 11.90 11.90 _mcount
19.51 19.33 7.43 10097 0.74 1.06 base_yylex
7.48 22.18 2.85 21953666 0.00 0.00 appendStringInfoChar
5.88 24.42 2.24 776 2.89 2.89 pglz_compress
4.36 26.08 1.66 21954441 0.00 0.00 pq_getbyte
3.57 27.44 1.36 7852141 0.00 0.00 addlit
3.26 28.68 1.24 1552 0.80 0.81 scanstr
2.84 29.76 1.08 779 1.39 7.18 pq_getstring
2.31 30.64 0.88 10171 0.09 0.09 _doprnt
2.26 31.50 0.86 776 1.11 1.11 byteain
2.07 32.29 0.79 msquadloop
1.60 32.90 0.61 7931430 0.00 0.00 memcpy
1.18 33.35 0.45 chunks
1.08 33.76 0.41 46160 0.01 0.01 strlen
1.08 34.17 0.41 encore
1.05 34.57 0.40 8541 0.05 0.05 XLogInsert
0.89 34.91 0.34 appendStringInfo

60% of the call graph time is accounted for by these two areas:

index % time self children called name
7.43 3.32 10097/10097 yylex [14]
[13] 41.0 7.43 3.32 10097 base_yylex [13]
1.36 0.61 7852141/7852141 addlit [28]
1.24 0.01 1552/1552 scanstr [30]
0.02 0.03 3108/3108 ScanKeywordLookup [99]
0.00 0.02 2335/2335 yy_get_next_buffer [144]
0.02 0.00 776/781 strtol [155]
0.00 0.01 777/3920 MemoryContextStrdup [108]
0.00 0.00 1/1 base_yy_create_buffer [560]
0.00 0.00 4675/17091 isupper [617]
0.00 0.00 1556/1556 yy_get_previous_state [671]
0.00 0.00 779/779 yywrap [706]
0.00 0.00 1/2337 base_yy_load_buffer_state [654]
-----------------------------------------------
1.08 4.51 779/779 pq_getstr [17]
[18] 21.4 1.08 4.51 779 pq_getstring [18]
2.85 0.00 21953662/21953666 appendStringInfoChar [20]
1.66 0.00 21954441/21954441 pq_getbyte [29]
-----------------------------------------------

While we could probably do a little bit to speed up pg_getstring and its
children, it's not clear that we can do anything about yylex, which is
flex output code not handmade code, and is probably well-tuned already.

Bottom line: feeding huge strings through the lexer is slow.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2001-12-04 04:07:38 Re: FW: [CYGWIN] 7.2b3 postmaster doesn't start on Win98
Previous Message Tom Lane 2001-12-04 02:54:06 Re: [SQL] An easy question about creating a primary key