Re: Table AM modifications to accept column projection lists

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Soumyadeep Chakraborty <soumyadeep2007(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pchampion(at)vmware(dot)com, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Subject: Re: Table AM modifications to accept column projection lists
Date: 2020-12-28 10:29:21
Message-ID: CAD21AoDXW5PHkO0Bk7Gteze06Mo1ypMNOEnDL9Njk8zRJazOyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Soumyadeep,

On Sat, Nov 14, 2020 at 3:02 AM Soumyadeep Chakraborty
<soumyadeep2007(at)gmail(dot)com> wrote:
>
> Hello,
>
> This patch introduces a set of changes to the table AM APIs, making them
> accept a column projection list. That helps columnar table AMs, so that
> they don't need to fetch all columns from disk, but only the ones
> actually needed.
>
> The set of changes in this patch is not exhaustive -
> there are many more opportunities that are discussed in the TODO section
> below. Before digging deeper, we want to elicit early feedback on the
> API changes and the column extraction logic.
>
> TableAM APIs that have been modified are:
>
> 1. Sequential scan APIs
> 2. Index scan APIs
> 3. API to lock and return a row
> 4. API to fetch a single row
>
> We have seen performance benefits in Zedstore for many of the optimized
> operations [0]. This patch is extracted from the larger patch shared in
> [0].
>
> ------------------------------------------------------------------------
> Building the column projection set:
>
> In terms of building the column projection set necessary for each of
> these APIs, this patch builds off of the scanCols patch [1], which
> Ashwin and Melanie had started earlier. As noted in [1], there are cases
> where the scanCols set is not representative of the columns to be
> projected. For instance, in a DELETE .. RETURNING query, there is
> typically a sequential scan and a separate invocation of
> tuple_fetch_row_version() in order to satisfy the RETURNING clause (see
> ExecDelete()). So for a query such as:
>
> DELETE from foo WHERE i < 100 && j < 1000 RETURNING k, l;
>
> We need to pass the set (i, j) to the scan and (k, l) to the
> tuple_fetch_row_version() invocation. This is why we had to introduce
> the returningCols field.
>
> In the same spirit, separate column projection sets are computed for any
> operations that involve an EPQ check (INSERT, DELETE, UPDATE, row-level
> locking etc), the columns involved in an ON CONFLICT UPDATE etc.
>
> Recognizing and collecting these sets of columns is done at various
> stages: analyze and rewrite, planner and executor - depending on the
> type of operation for which the subset of columns is calculated. The
> column bitmaps are stored in different places as well - such as the ones
> for scans and RETURNING are stored in RangeTblEntry, whereas the set of
> columns for ON CONFLICT UPDATE are stored in OnConflictSetState.
>
> ------------------------------------------------------------------------
> Table AM API changes:
>
> The changes made to the table AM API, introducing the column projection
> set, come in different flavors. We would like feedback on what style
> we need to converge to or if we should use different styles depending
> on the situation.
>
> - A new function variant that takes a column projection list, such as:
>
> TableScanDesc (*scan_begin) (Relation rel,
> Snapshot snapshot,
> int nkeys, struct ScanKeyData *key,
> ParallelTableScanDesc pscan,
> uint32 flags);
> ->
>
> TableScanDesc (*scan_begin_with_column_projection)(Relation relation,
> Snapshot snapshot,
> int nkeys, struct ScanKeyData *key,
> ParallelTableScanDesc parallel_scan,
> uint32 flags,
> Bitmapset *project_columns);
>
> - Modifying the existing function to take a column projection list, such
> as:
>
> TM_Result (*tuple_lock) (Relation rel,
> ItemPointer tid,
> Snapshot snapshot,
> TupleTableSlot *slot,
> CommandId cid,
> LockTupleMode mode,
> LockWaitPolicy wait_policy,
> uint8 flags,
> TM_FailureData *tmfd);
>
> ->
>
> TM_Result (*tuple_lock) (Relation rel,
> ItemPointer tid,
> Snapshot snapshot,
> TupleTableSlot *slot,
> CommandId cid,
> LockTupleMode mode,
> LockWaitPolicy wait_policy,
> uint8 flags,
> TM_FailureData *tmfd,
> Bitmapset *project_cols);
>
> - A new function index_fetch_set_column_projection() to be called after
> index_beginscan() to set the column projection set, which will be used
> later by index_getnext_slot().
>
> void (*index_fetch_set_column_projection) (struct IndexFetchTableData *data,
> Bitmapset *project_columns);
>
> The set of columns expected by the new/modified functions is represented
> as a Bitmapset of attnums for a specific base relation. An empty/NULL
> bitmap signals to the AM that no data columns are needed. A bitmap
> containing the single element 0 indicates that we want all data columns
> to be fetched.
>
> The bitmaps do not include system columns.
>
> Additionally, the TupleTableSlots populated by functions such
> as table_scan_getnextslot(), need to be densely filled upto the highest
> numbered column in the projection list (any column not in the projection
> list should be populated with NULL). This is due to the implicit
> assumptions of the slot_get_***() APIs.
>
> ------------------------------------------------------------------------
> TODOs:
>
> - Explore opportunities to push the column extraction logic to the
> planner or pre-planner stages from the executor stage (like scanCols and
> returningCols), or at least elevate the column extraction logic to be
> done once per executor run instead of once per tuple.
>
> - As was requested in [1], we should guard column projection set
> extraction logic with a table_scans_leverage_column_projection() call.
> We wouldn't want a non-columnar AM to incur the overhead.
>
> - Standardize the table AM API for passing columns.
>
> - The optimization for DELETE RETURNING does not currently work for
> views. We have to populate the list of columns for the base relation
> beneath the view properly.
>
> - Currently the benefit of passing in an empty projection set for ON
> CONFLICT DO UPDATE (UPSERT) and ON CONFLICT DO NOTHING (see
> ExecCheckTIDVisible()) is masked by a preceding call to
> check_exclusion_or_unique_constraint() which has not yet been modified
> to pass a column projection list to the index scan.
>
> - Compute scanCols earlier than set_base_rel_sizes() and use that
> information to produce better relation size estimates (relation size
> will depend on the number of columns projected) in the planner.
> Essentially, we need to absorb the work done by Pengzhou [2].
>
> - Right now, we do not extract a set of columns for the call to
> table_tuple_lock() within GetTupleForTrigger() as it may be hard to
> determine the list of columns used in a trigger body [3].
>
> - validateForeignKeyConstraint() should only need to fetch the
> foreign key column.
>
> - List of index scan callsites that will benefit from calling
> index_fetch_set_column_projection():
>
> -- table_index_fetch_tuple_check() does not need to fetch any
> columns (we have to pass an empty column bitmap), fetching the tid
> should be enough.
>
> -- unique_key_recheck() performs a liveness check for which we do
> not need to fetch any columns (we have to pass an empty column
> bitmap)
>
> -- check_exclusion_or_unique_constraint() needs to only fetch the
> columns that are part of the exclusion or unique constraint.
>
> -- IndexNextWithReorder() needs to only fetch columns being
> projected along with columns in the index qual and columns in the
> ORDER BY clause.
>
> -- get_actual_variable_endpoint() only performs visibility checks,
> so we don't need to fetch any columns (we have to pass an empty
> column projection bitmap)
>
> - BitmapHeapScans can benefit from a column projection list the same
> way as an IndexScan and SeqScan can. We can possibly pass down scanCols
> in ExecInitBitmapHeapScan(). We would have to modify the BitmapHeapScan
> table AM calls to take a column projection bitmap.
>
> - There may be more callsites where we can pass a column projection list.
>

You sent in your patch to pgsql-hackers on Nov 14, but you did not
post it to the next CommitFest[1]. If this was intentional, then you
need to take no action. However, if you want your patch to be
reviewed as part of the upcoming CommitFest, then you need to add it
yourself and may need to rebase the patch to the current HEAD before
2021-01-01 AOE[2]. Thanks for your contributions.

Regards,

[1] https://commitfest.postgresql.org/31/
[2] https://en.wikipedia.org/wiki/Anywhere_on_Earth

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2020-12-28 10:42:03 Re: doc review for v14
Previous Message Masahiko Sawada 2020-12-28 10:14:24 Re: PATCH: Report libpq version and configuration