Re: DRAFT: Pass sk_attno to consistent function

From: Michał Kłeczek <michal(at)kleczek(dot)org>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: DRAFT: Pass sk_attno to consistent function
Date: 2024-03-19 16:00:15
Message-ID: E17B60DE-42A3-4AAD-81C4-56CFF1D4CDD5@kleczek.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

Since it looks like there is not much interest in the patch I will try to provide some background to explain why I think it is needed.

We are in the process of migration from an old db platform to PostgreSQL.
Our database is around 10TB big and contains around 10 billion financial transactions in a single table.
Each transaction is assigned to an account (column acc_number).

We have partitioned the table BY HASH (acc_number).

A client can query transactions belonging to his accounts using several criteria - among them is te xt search.
Queries are of type TOP N (ie ORDER BY … LIMIT ).

The list of accounts that we are querying is provided as a parameter to the query.

We have decided to use a single Gist index supporting all queries (reasons described in [1]).

There are several problems with Gist usage (but I still think we have no other choice) but the most important is
that we cannot use SAOP in our queries - since Gist does not support it the planner decides to perform Bitmap Scan
which in turn does not support ORDER BY … LIMIT well because requires Sort.

So when we use “= ANY (array of account numbers) … ORDER BY ...” the plan requires reading all records meeting
search criteria and then sort.

As a workaround we have to perform LATERAL joins:

unnest(list of account numbers) AS a(n) LATERAL JOIN (SELECT * FROM … WHERE account_number = a.n ORDER BY … LIMIT …) ORDER BY … LIMIT …

It is still bad because requires multiple scans of the same partition if account number hashes are the same.

What we really need is for Gist to support “= ANY (…)”.
As Gist index is extensible in terms of queries it supports it is quite easy to implement an operator class/family with operator:

||= (text, text[])

that has semantics the same as “= ANY (…)”

With this operator we can write our queries like:

account_number ||= [list of account numbers] AND
account_number = ANY ([list of account numbers]) — redundant for partition pruning as it does not understand ||=

and have optimal plans:

Limit
— Merge Append
—— Index scan of relevant partitions

The problem is that now each partition scan is for the same list of accounts.
The “consistent” function cannot assume anything about contents of the table so it has to check all elements of the list
and that in turn causes reading unnecessary index pages for accounts not in this partition.

What we need is a way for the “consistent” function to be able to pre-process input query array and remove elements
that are not relevant for this scan. To do that it is necessary to have enough information to read necessary metadata from the catalog.

The proposed patch addresses this need and seems (to me) largely uncontroversial as it does not break any existing extensions.

Attached is a patch with consolidated changes (I am pretty new to managing patches so previous two were partial and not something shareable, I am afraid).


Michal

Attachment Content-Type Size
0001-Pass-key-sk_attno-to-consistent-function-in-gistinde.patch application/octet-stream 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-03-19 16:05:20 Re: Possibility to disable `ALTER SYSTEM`
Previous Message Jelte Fennema-Nio 2024-03-19 15:36:43 Re: Possibility to disable `ALTER SYSTEM`