Scanner performance (was Re: 7.3 schedule)

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashley Cambrell <ash(at)freaky-namuh(dot)com>, Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Scanner performance (was Re: 7.3 schedule)
Date: 2002-04-13 06:17:06
Message-ID: Pine.LNX.4.30.0204121850140.847-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane writes:

> We do have some numbers suggesting that the per-character loop in the
> lexer is slow enough to be a problem with very long literals. That is
> the overhead that might be avoided with a special protocol.

Which loop is that? Doesn't the scanner use buffered input anyway?

> However, it should be noted that (AFAIK) no one has spent any effort at
> all on trying to make the lexer go faster. There is quite a bit of
> material in the flex documentation about performance considerations ---
> someone should take a look at it and see if we can get any wins by being
> smarter, without having to introduce protocol changes.

My profiles show that the work spent in the scanner is really minuscule
compared to everything else.

The data appears to support a suspicion that I've had many moons ago that
the binary search for the key words takes quite a bit of time:

0.22 0.06 66748/66748 yylex [125]
[129] 0.4 0.22 0.06 66748 base_yylex [129]
0.01 0.02 9191/9191 yy_get_next_buffer [495]
0.02 0.00 32808/34053 ScanKeywordLookup [579]
0.00 0.01 16130/77100 MemoryContextStrdup [370]
0.00 0.00 4000/4000 scanstr [1057]
0.00 0.00 4637/4637 yy_get_previous_state [2158]
0.00 0.00 4554/4554 base_yyrestart [2162]
0.00 0.00 4554/4554 yywrap [2163]
0.00 0.00 1/1 base_yy_create_buffer [2852]
0.00 0.00 1/13695 base_yy_load_buffer_state [2107]

I while ago I've experimented with hash functions for the key word lookup
and got a speedup of factor 2.5, but again, this is really minor in the
overall scheme of things.

(The profile data is from a run of all the regression test files in order
in one session.)

--
Peter Eisentraut peter_e(at)gmx(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-04-13 06:17:34 Re: RFC: Restructuring pg_aggregate
Previous Message Peter Eisentraut 2002-04-13 06:08:08 Re: Suggestions please: names for function cachabilityattributes