Re: plan time of MASSIVE partitioning ...

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: plan time of MASSIVE partitioning ...
Date: 2010-10-28 11:29:30
Message-ID: 4CC95E9A.5000605@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Boszormenyi Zoltan írta:
> Boszormenyi Zoltan írta:
>
>> Boszormenyi Zoltan írta:
>>
>>
>>> Heikki Linnakangas írta:
>>>
>>>
>>>
>>>> On 26.10.2010 18:34, Boszormenyi Zoltan wrote:
>>>>
>>>>
>>>>
>>>>> thank you very much for pointing me to dynahash, here is the
>>>>> next version that finally seems to work.
>>>>>
>>>>> Two patches are attached, the first is the absolute minimum for
>>>>> making it work, this still has the Tree type for canon_pathkeys
>>>>> and eq_classes got the same treatment as join_rel_list/join_rel_hash
>>>>> has in the current sources: if the list grows larger than 32, a hash
>>>>> table
>>>>> is created. It seems to be be enough for doing in for
>>>>> get_eclass_for_sort_expr()
>>>>> only, the other users of eq_classes aren't bothered by this change.
>>>>>
>>>>>
>>>>>
>>>> That's better, but can't you use dynahash for canon_pathkeys as well?
>>>>
>>>>
>>>>
>>> Here's a purely dynahash solution. It's somewhat slower than
>>> the tree version, 0.45 vs 0.41 seconds in the cached case for the
>>> previously posted test case.
>>>
>>>
>>>
>> And now in context diff, sorry for my affection towards unified diffs. :-)
>>
>>
>
> A little better version, no need for the heavy hash_any, hash_uint32
> on the lower 32 bits on pk_eclass is enough. The profiling runtime
> is now 0.42 seconds vs the previous 0.41 seconds for the tree version.
>
> Best regards,
> Zoltán Böszörményi
>

Btw, the top entries in the current gprof output are:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
19.05 0.08 0.08 482 0.17 0.29
add_child_rel_equivalences
11.90 0.13 0.05 1133447 0.00 0.00 bms_is_subset
9.52 0.17 0.04 331162 0.00 0.00
hash_search_with_hash_value
7.14 0.20 0.03 548971 0.00 0.00 AllocSetAlloc
4.76 0.22 0.02 2858 0.01 0.01 get_tabstat_entry
4.76 0.24 0.02 1136 0.02 0.02 tzload

This means add_child_rel_equivalences() is still takes
too much time, the previously posted test case calls this
function 482 times, it's called for almost every 10th entry
added to eq_classes. The elog() I put into this function says
that at the last call list_length(eq_classes) == 4754.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-10-28 11:35:23 Re: plan time of MASSIVE partitioning ...
Previous Message Boszormenyi Zoltan 2010-10-28 10:54:59 Re: plan time of MASSIVE partitioning ...