Re: Parser Cruft in gram.y

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)mail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parser Cruft in gram.y
Date: 2012-12-20 15:04:49
Message-ID: 20121220150448.GG4303@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2012-12-20 15:51:37 +0100, Andres Freund wrote:
> On 2012-12-20 15:45:47 +0100, Andres Freund wrote:
> > On 2012-12-20 09:11:46 -0500, Robert Haas wrote:
> > > On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > > > On 18 December 2012 22:10, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > >> Well that would be nice, but the problem is that I see no way to
> > > >> implement it. If, with a unified parser, the parser is 14% of our
> > > >> source code, then splitting it in two will probably crank that number
> > > >> up well over 20%, because there will be duplication between the two.
> > > >> That seems double-plus un-good.
> > > >
> > > > I don't think the size of the parser binary is that relevant. What is
> > > > relevant is how much of that is regularly accessed.
> > > >
> > > > Increasing parser cache misses for DDL and increasing size of binary
> > > > overall are acceptable costs if we are able to swap out the unneeded
> > > > areas and significantly reduce the cache misses on the well travelled
> > > > portions of the parser.
> > >
> > > I generally agree. We don't want to bloat the size of the parser with
> > > wild abandon, but yeah if we can reduce the cache misses on the
> > > well-travelled portions that seems like it ought to help. My previous
> > > hacky attempt to quantify the potential benefit in this area was:
> > >
> > > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php
> > >
> > > On my machine there seemed to be a small but consistent win; on a very
> > > old box Jeff Janes tried, it didn't seem like there was any benefit at
> > > all. Somehow, I have a feeling we're missing a trick here.
> >
> > I don't think you will see too many cache misses on such a low number of
> > extremly simply statements, so its not too surprising not to see a that
> > big difference with that.
> >
> > Are we sure its really cache-misses and not just the actions performed
> > in the grammar that make bison code show up in profiles? I remember the
> > latter being the case...
>
> Hm. A very, very quick perf stat -dvvv of pgbench -S -c 20 -j 20 -T 20 later:
>
> 218350.885559 task-clock # 10.095 CPUs utilized
> 1,676,479 context-switches # 0.008 M/sec
> 2,396 cpu-migrations # 0.011 K/sec
> 796,038 page-faults # 0.004 M/sec
> 506,312,525,518 cycles # 2.319 GHz [20.00%]
> 405,944,435,754 stalled-cycles-frontend # 80.18% frontend cycles idle [30.32%]
> 236,712,872,641 stalled-cycles-backend # 46.75% backend cycles idle [40.51%]
> 193,459,120,458 instructions # 0.38 insns per cycle
> # 2.10 stalled cycles per insn [50.70%]
> 36,433,144,472 branches # 166.856 M/sec [51.12%]
> 3,623,778,087 branch-misses # 9.95% of all branches [50.87%]
> 50,344,123,581 L1-dcache-loads # 230.565 M/sec [50.33%]
> 5,548,192,686 L1-dcache-load-misses # 11.02% of all L1-dcache hits [49.69%]
> 2,666,461,361 LLC-loads # 12.212 M/sec [35.63%]
> 112,407,198 LLC-load-misses # 4.22% of all LL-cache hits [ 9.67%]
>
> 21.629396701 seconds time elapsed
>
> So there definitely a noticeable rate of cache misses...

L1 misses:
# Samples: 997K of event 'L1-dcache-load-misses'
# Overhead Command Shared Object Symbol
# ........ ........ ...............................................................
6.49% postgres postgres [.] SearchCatCache
3.65% postgres postgres [.] base_yyparse
3.48% postgres postgres [.] hash_search_with_hash_value
3.41% postgres postgres [.] AllocSetAlloc
1.84% postgres postgres [.] LWLockAcquire
1.40% postgres postgres [.] fmgr_info_cxt_security
1.36% postgres postgres [.] nocachegetattr
1.23% postgres libc-2.13.so [.] _int_malloc
1.20% postgres postgres [.] core_yylex
1.15% postgres postgres [.] MemoryContextAllocZeroAligned
0.94% postgres postgres [.] PostgresMain
0.94% postgres postgres [.] MemoryContextAlloc
0.91% postgres libc-2.13.so [.] __memcpy_ssse3_back
0.89% postgres postgres [.] CatalogCacheComputeHashValue
0.86% postgres postgres [.] PinBuffer
0.86% postgres [kernel.kallsyms] [k] __audit_syscall_entry
0.80% postgres libc-2.13.so [.] __strcmp_sse42
0.80% postgres postgres [.] _bt_compare
0.78% postgres postgres [.] FunctionCall2Coll
0.77% postgres libc-2.13.so [.] malloc
0.73% postgres libc-2.13.so [.] __memset_sse2
0.72% postgres postgres [.] GetSnapshotData
0.69% postgres [kernel.kallsyms] [k] fget_light
0.69% postgres postgres [.] DirectFunctionCall1Coll
0.67% postgres postgres [.] hash_search
0.67% postgres libc-2.13.so [.] 0x000000000011a3a5
0.66% postgres postgres [.] pgstat_initstats
0.66% postgres postgres [.] AllocSetFree
0.65% postgres libc-2.13.so [.] __strlen_sse42
0.60% postgres libc-2.13.so [.] _int_free
0.60% postgres [kernel.kallsyms] [k] cpuacct_charge
0.59% postgres postgres [.] heap_getsysattr
0.59% postgres postgres [.] MemoryContextAllocZero
0.58% postgres postgres [.] PopActiveSnapshot
0.53% postgres libc-2.13.so [.] __memcmp_sse4_1
0.51% postgres postgres [.] ReadBuffer_common
0.49% postgres postgres [.] ScanKeywordLookup
0.49% postgres postgres [.] LockAcquireExtended
0.47% postgres [kernel.kallsyms] [k] update_cfs_shares
0.45% postgres postgres [.] SearchCatCacheList
0.45% postgres postgres [.] new_list
0.44% postgres postgres [.] get_relation_info

LLC misses:
# Samples: 1M of event 'LLC-load-misses'
# Event count (approx.): 143379713
# Overhead Command Shared Object Symbol
# ........ ........ ...............................................................
25.08% postgres postgres [.] _bt_compare
12.84% postgres postgres [.] PinBuffer
9.18% postgres postgres [.] LWLockAcquire
6.31% postgres postgres [.] GetSnapshotData
6.08% postgres postgres [.] heap_hot_search_buffer
5.13% postgres postgres [.] hash_search_with_hash_value
4.85% postgres postgres [.] _bt_checkpage
3.95% postgres postgres [.] _bt_moveright
3.09% postgres postgres [.] heap_page_prune_opt
2.12% postgres postgres [.] slot_deform_tuple
1.98% postgres postgres [.] LWLockRelease
1.82% postgres libc-2.13.so [.] __memcmp_sse4_1
1.16% postgres postgres [.] ExecProject
1.16% postgres postgres [.] FunctionCall2Coll
0.94% postgres [kernel.kallsyms] [k] copy_user_generic_string
0.94% postgres [kernel.kallsyms] [k] tg_load_down
0.78% postgres [kernel.kallsyms] [k] find_get_page
0.73% postgres postgres [.] ProcArrayEndTransaction
0.73% postgres postgres [.] pfree
0.71% postgres postgres [.] pgstat_report_xact_timestamp
0.69% postgres postgres [.] index_fetch_heap
0.66% postgres postgres [.] LockAcquireExtended
0.56% postgres postgres [.] LockBuffer
0.45% postgres postgres [.] slot_getsomeattrs
0.40% postgres postgres [.] _bt_readpage

So it seems L1 misses are the interesting thing wrt to parsing.

When doing a source/assembly annotation in SearchCatCache about half of
the misses are attributed to the memcpy() directly at the beginning.
In base_yyparse the three biggest offenders (15%, 10.5%, 5.58%) really
seem to be various kinds of table lookups.

So it seems to confirm the various suspicious that the table size might
be rather relevant.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2012-12-20 15:29:03 Re: discarding duplicate indexes
Previous Message Amit Kapila 2012-12-20 14:56:56 Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]