Quick Links

PoC Refactor AM analyse API

From:	Смирнов Денис <sd(at)arenadata(dot)io>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	PoC Refactor AM analyse API
Date:	2020-12-07 13:23:42
Message-ID:	C7CFE16B-F192-4124-BEBB-7864285E0FF7@arenadata.io
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello all!

I suggest a refactoring of analyze AM API as it is too much heap specific at the moment. The problem was inspired by Greenplum’s analyze improvement for append-optimized row and column AM with variable size compressed blocks.
Currently we do analyze in two steps.

1. Sample fix size blocks with algorithm S from Knuth (BlockSampler function)
2. Collect tuples into reservoir with algorithm Z from Vitter.

So this doesn’t work for AMs using variable sized physical blocks for example. They need weight random sampling (WRS) algorithms like A-Chao or logical blocks to follow S-Knuth (and have a problem with RelationGetNumberOfBlocks() estimating a physical number of blocks). Another problem with columns - they are not passed to analyze begin scan and can’t benefit from column storage at ANALYZE TABLE (COL).

The suggestion is to replace table_scan_analyze_next_block() and table_scan_analyze_next_tuple() with a single function: table_acquire_sample_rows(). The AM implementation of table_acquire_sample_rows() can use the BlockSampler functions if it wants to, but if the AM is not block-oriented, it could do something else. This suggestion also passes VacAttrStats to table_acquire_sample_rows() for column-oriented AMs and removes PROGRESS_ANALYZE_BLOCKS_TOTAL and PROGRESS_ANALYZE_BLOCKS_DONE definitions as not all AMs can be block-oriented.

Best regards,
Denis Smirnov | Developer
sd(at)arenadata(dot)io
Arenadata | Godovikova 9-17, Moscow 129085 Russia

Responses

Re: PoC Refactor AM analyse API at 2020-12-08 00:53:41 from Denis Smirnov
Re: PoC Refactor AM analyse API at 2020-12-08 08:42:12 from Andrey Borodin

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bharath Rupireddy	2020-12-07 13:34:24	Re: Parallel Inserts in CREATE TABLE AS
Previous Message	Daniel Gustafsson	2020-12-07 13:15:58	Re: Refactor MD5 implementations and switch to EVP for OpenSSL