Quick Links

Re: wip: functions median and percentile

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"David Fetter" <david(at)fetter(dot)org>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: wip: functions median and percentile
Date:	2010-10-01 15:15:03
Message-ID:	4CA5B4A70200002500036331@gw.wicourts.gov
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-rrreviewers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>> Another suggestion?
>
> The implementation I would've expected to see is to do the sort
> and then have two code paths for retrieving the median, depending
> on whether the sort result is all in memory or not.

Would it make sense to accumulate value/count pairs in a hash table,
along with a total count, as the tuples are encountered, and sort
the (potentially smaller) hash table at the end? (Not that this
helps with the memory management questions...) Large sets with any
significant degree of duplication in values (say the age in years of
residents of a state) would probably run significantly faster this
way.

-Kevin

In response to

Re: wip: functions median and percentile at 2010-10-01 15:08:00 from Tom Lane

Responses

Re: wip: functions median and percentile at 2010-10-01 15:35:03 from Hitoshi Harada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hitoshi Harada	2010-10-01 15:16:03	Re: wip: functions median and percentile
Previous Message	Tom Lane	2010-10-01 15:08:00	Re: wip: functions median and percentile

Browse pgsql-rrreviewers by date

	From	Date	Subject
Next Message	Hitoshi Harada	2010-10-01 15:16:03	Re: wip: functions median and percentile
Previous Message	Tom Lane	2010-10-01 15:08:00	Re: wip: functions median and percentile