Re: [GSoC 2018] Proposal Draft

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Kefan Yang <starordust(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [GSoC 2018] Proposal Draft
Date: 2018-03-18 16:39:16
Message-ID: 4127961C-28C1-42D3-BBBC-A88312F80F49@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kefar!

> 18 марта 2018 г., в 5:34, Kefan Yang <starordust(at)gmail(dot)com> написал(а):
>
> I am Kefan Yang, a third-year Computing Science student from Simon Fraser University, Canada. I am very interested in the sorting algorithm benchmarking and implementation issue you mentioned on the idealist of Google Summer of Code 2018.
>
> I am currently working on my proposal, but I am a little bit worried about whether I am on the right track. I've encountered some problems in the implementation of the benchmark of those sorting algorithms. I briefly described my current ideas in the proposal draft, but they may change a bit as I read more research papers. I sincerely hope to get some feedback from you to improve my proposal, especially for the implementation part.
>
> I've attached a proposal draft to this email. It's a brief one but I guess it's enough to show my current ideas:)

Peter have already added some context, I'll comment a bit on contents of your proposal.

I expect implementation of both timsort (dual pivot qsort) and introsort (depth-limited qsort). MySQL recently switched to introsort (they just used STL version instead of his own qsort).
Sorting routines are important part of many DBMS parts. But, indeed, sorting of actual user data is related to qsort very distantly - often DBMS deals with external sorting.

There is built-in Postgres way to benchmark - pgbench performance. But I do not think that it's output will be statistically conclusive. I think that benchmarking will be at least as time consuming as implementation itself.

BTW I think it is good to add Postgres hash table implementation to this project too. Currently, we use Larson's linear hashing (HTAB struct, dynahash.c), which is somewhat seldom used nowadays. It would be good to tests cuckoo hashing and some open addressing against HTAB interface.

Please do not forget to add to your proposal review of other's patches. If you will produce some useful output in a patch form and you will want it committed you will have to review someone else patch. This is required by Postgres dev process.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-18 16:48:47 Re: Recently-introduced segfault in initdb?
Previous Message Isaac Morland 2018-03-18 16:33:57 Re: Recently-introduced segfault in initdb?