|From:||Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>|
|Subject:||Patch: ResourceOwner optimization for tables with many partitions|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Current implementation of ResourceOwner uses arrays to store resources
like TupleDescs, Snapshots, etc. When we want to release one of these
resources ResourceOwner finds it with linear scan. Granted, resource
array are usually small so its not a problem most of the time. But it
appears to be a bottleneck when we are working with tables which have a
lot of partitions.
To reproduce this issue:
1. run `./gen.pl 10000 | psql my_database postgres`
2. run `pgbench -j 8 -c 8 -f q.sql -T 100 my_database`
3. in second terminal run `sudo perf top -u postgres`
Both gen.pl and q.sql are attached to this message.
You will see that postgres spends a lot of time in ResourceOwnerForget*
32.80% postgres [.] list_nth
20.29% postgres [.] ResourceOwnerForgetRelationRef
12.87% postgres [.] find_all_inheritors
7.90% postgres [.] get_tabstat_entry
6.68% postgres [.] ResourceOwnerForgetTupleDesc
1.17% postgres [.] hash_search_with_hash_value
... < 1% ...
I would like to suggest a patch (see attachment) witch fixes this
bottleneck. Also I discovered that there is a lot of code duplication in
ResourceOwner. Patch fixes this too. The idea is quite simple. I just
replaced arrays with something that could be considered hash tables,
but with some specific optimizations.
After applying this patch we can see that bottleneck is gone:
42.89% postgres [.] list_nth
18.30% postgres [.] find_all_inheritors
10.97% postgres [.] get_tabstat_entry
1.82% postgres [.] hash_search_with_hash_value
1.21% postgres [.] SearchCatCache
... < 1% ...
For tables with thousands partitions we gain in average 32.5% more TPS.
As far as I can see in the same time patch doesn't make anything worse.
`make check` passes with asserts enabled and disabled. There is no
performance degradation according to both standard pgbench benchmark
and benchmark described above for tables with 10 and 100 partitions.
|Next Message||Greg Stark||2015-12-04 12:35:24||Re: atomic reads & writes (with no barriers)|
|Previous Message||Andres Freund||2015-12-04 09:12:46||Re: proposal: add 'waiting for replication' to pg_stat_activity.state|