Re: [GSoC] application ideas

From: pantilimonov misha <pantlimon(at)yandex(dot)ru>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GSoC] application ideas
Date: 2019-03-20 11:25:23
Message-ID: 5600191553081123@iva4-2ff4d94b5835.qloud-c.yandex.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Excuse me for the previous letter, should be fixed now by using simple html.

---

Greetings,

i am interested in databases and would like to make a contribution to the
PostgreSQL by participating in GSoC 2019. Currently i am studying in HSE[1],
doing last year of master's program that mostly build on top of collaboration
with ISP RAS[2].

In the previous year i have been working on llvm_jit extension for
PostgreSQL 9.6, that was developed in ISP RAS and presented at PGCON[3].
Specifically, my work consisted of adding support for several missing
nodes(bitmapscan, mergejoin, subqueryscan, etc)
by rewriting them with LLVM API, as well as other functionality(e.g. distinct in group by)
that is required to fully support TPC-H of SCALE 100.

Originally i wanted to pursue "TOAST" tasks from ideas list, but noticed
that couple of students have already mentioned them in mailing list. So, instead
of increasing the queue for single possible idea, i would like to offer other
ones, that sound interesting to me and can potentially be useful for PostgreSQL
and community:

1) The so-called Adaptive join, that exists in modern Oracle[4] and MSSQL[5]
versions. This type of node is designed to mitigate cardinality estimation
errors in queries that are somewhere inbetween NL(nested loop with indexscan)
and HJ(hashjoin).

One possible implementation of that is to start execution in HJ fasion, by accumulating
rows in hashtable with certain threshold. If threshold is not exceeded, then
continue with indexscan, otherwise switch to usual HJ.

2) Changing buffer manager strategy.
Somewhere in 2016 Andres Freund made a presention[6] of possible improvements
that can be done in buffer manager. I find the idea of changing hashtable to
trees of radix trees[7] promising. Most likely, taking into account program's
time constraints, this task won't be done as "ready to deploy" solution.
Instead, some kind of prototype can be implemented and benchmarked. 

3) Improvements in jit component.
Great progress has been made in this direction in 10 and 11 versions, but
still there's a lot to be done. Possible subareas: compiled code caching/sharing,
cost-based optimizer improvements, push-based execution with bytecode
transformation, compiling plpgsql, etc.

At this stage i would like to receive some feedback from the community,
which of those ideas are more useful for the near future of PostgreSQL and
more suitable for GSoC itself. With that information i can dive into particular
topic, extract additional information and prepare required proposal.

p.s. my preferred order: 2,1,3

--------------------------------------------------------------------------------
[1] https://www.hse.ru/en/ma/sp
[2] http://www.ispras.ru/en/
[3] http://www.pgcon.org/2017/schedule/events/1092.en.html
[4] https://www.oracle.com/technetwork/database/bi-datawarehousing/twp-optimizer-with-oracledb-12c-1963236.pdf
[5] https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/19/introducing-batch-mode-adaptive-joins/
[6] https://pgconf.ru/media/2016/05/13/1io.pdf
[7] http://events17.linuxfoundation.org/sites/events/files/slides/LinuxConNA2016%20-%20Radix%20Tree.pdf

Best regards,

Michael.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-20 11:43:26 Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name
Previous Message David Steele 2019-03-20 11:02:25 Re: Re: Psql patch to show access methods info