Re: Scanner performance (was Re: 7.3 schedule)

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashley Cambrell <ash(at)freaky-namuh(dot)com>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scanner performance (was Re: 7.3 schedule)
Date: 2002-04-16 17:24:12
Message-ID: Pine.LNX.4.30.0204161125280.689-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane writes:

> The regression tests contain no very-long literals. The results I was
> referring to concerned cases with string (BLOB) literals in the
> hundreds-of-K range; it seems that the per-character loop in the flex
> lexer starts to look like a bottleneck when you have tokens that much
> larger than the rest of the query.
>
> Solutions seem to be either (a) make that loop quicker, or (b) find a
> way to avoid passing BLOBs through the lexer. I was merely suggesting
> that (a) should be investigated before we invest the work implied
> by (b).

I've done the following test: Ten statements of the form

SELECT 1 FROM tab1 WHERE val = '...';

where ... are literals of length 5 - 10 MB (some random base-64 encoded
MP3 files). "tab1" was empty. The test ran 3:40 min wall-clock time.

Top ten calls:

% cumulative self self total
time seconds seconds calls ms/call ms/call name
36.95 9.87 9.87 74882482 0.00 0.00 pq_getbyte
22.80 15.96 6.09 11 553.64 1450.93 pq_getstring
13.55 19.58 3.62 11 329.09 329.10 scanstr
12.09 22.81 3.23 110 29.36 86.00 base_yylex
4.27 23.95 1.14 34 33.53 33.53 yy_get_previous_state
3.86 24.98 1.03 22 46.82 46.83 textin
3.67 25.96 0.98 34 28.82 28.82 myinput
1.83 26.45 0.49 45 10.89 32.67 yy_get_next_buffer
0.11 26.48 0.03 3027 0.01 0.01 AllocSetAlloc
0.11 26.51 0.03 129 0.23 0.23 fmgr_isbuiltin

The string literals didn't contain any backslashes, so scanstr is
operating in the best-case scenario here. But for arbitary binary data we
need some escape mechanism, so I don't see much room for improvement
there.

It seems the real bottleneck is the excessive abstraction in the
communications layer. I haven't looked closely at all, but it would seem
better if pq_getstring would not use pq_getbyte and instead read the
buffer directly.

--
Peter Eisentraut peter_e(at)gmx(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-04-16 17:55:41 Re: Scanner performance (was Re: 7.3 schedule)
Previous Message Stephan Szabo 2002-04-16 17:18:52 Re: [HACKERS] Foreign Key woes -- 7.2 and ~7.3