Custom tuplesorts for extensions

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Custom tuplesorts for extensions
Date: 2022-06-23 08:50:42
Message-ID: CAPpHfdvjix0Ahx-H3Jp1M2R+_74P-zKnGGygx4OWr=bUQ8BNdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Some PostgreSQL extensions need to sort their pieces of data. Then it
worth to re-use our tuplesort. But despite our tuplesort having
extensibility, it's hidden inside tuplesort.c. There are at least a
couple of examples of how extensions deal with that.

1. RUM table access method: https://github.com/postgrespro/rum
RUM repository contains a copy of tuplesort.c for each major
PostgreSQL release. A reliable solution, but this is not how things
are intended to work, right?
2. OrioleDB table access method: https://github.com/orioledb/orioledb
OrioleDB runs on patches PostgreSQL. It contains a patch, which just
exposes all the guts of tuplesort.c to the tuplesort.h
https://github.com/orioledb/postgres/commit/d42755f52c

I think we need a proper way to let extension re-use our core
tuplesort facility. The attached patchset is intended to do this the
right way. Patches don't revise all the comments and lack code
beautification. The intention behind publishing this revision is to
verify the direction and get some feedback for further work.

0001-Remove-Tuplesortstate.copytup-v1.patch
It's unclear for me how do we split functionality between
Tuplesortstate.copytup() function and tuplesort_put*() functions. For
instance, copytup_index() and copytup_datum() return error while
tuplesort_putindextuplevalues() and tuplesort_putdatum() do their
work. The patch removes Tuplesortstate.copytup() altogether, putting
their functions to tuplesort_put*().

0002-Tuplesortstate.getdatum1-method-v1.patch
0003-Put-abbreviation-logic-into-puttuple_common-v1.patch
The tuplesort_put*() functions contains common part related to dealing
with abbreviation. The 0002 extracts logic of getting value of
SortTuple.datum1 into Tuplesortstate.getdatum1() function. Thanks to
this new interface function, 0003 puts abbreviation logic into
puttuple().

0004-Move-freeing-memory-away-from-writetup-v1.patch
Assuming that SortTuple.tuple is always just a single chunk of memory,
we can put memory counting logic away from Tuplesortstate.writetup().
This makes Tuplesortstate.getdatum1() easier to implement without
knowledge of tuplesort.c guts.

0005-Reorganize-data-structures-v1.patch
This commit splits the "public" part of Tuplesortstate into
TuplesortOps, which is intended to be exposed outside tuplesort.c.

0006-Split-tuplesortops.c-v1.patch
This patch finally splits tuplesortops.c from tuplesort.c. tuplesort.c
leaves which generic routines for tuple sort, while tuplesortops.c
provides the implementation for particular tuple formats.

------
Regards,
Alexander Korotkov

Attachment Content-Type Size
0001-Remove-Tuplesortstate.copytup-v1.patch application/x-patch 16.4 KB
0002-Tuplesortstate.getdatum1-method-v1.patch application/x-patch 7.6 KB
0003-Put-abbreviation-logic-into-puttuple_common-v1.patch application/x-patch 9.5 KB
0004-Move-freeing-memory-away-from-writetup-v1.patch application/x-patch 7.0 KB
0005-Reorganize-data-structures-v1.patch application/x-patch 66.1 KB
0006-Split-tuplesortops.c-v1.patch application/x-patch 112.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2022-06-23 08:50:51 RE: Use fadvise in wal replay
Previous Message Amit Kapila 2022-06-23 08:43:54 Re: Perform streaming logical transactions by background workers and parallel apply