| From: | Sergei Glukhov <s(dot)glukhov(at)postgrespro(dot)ru> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Partial hash index is not used for implied qual. |
| Date: | 2025-11-25 11:50:59 |
| Message-ID: | cb0b9982-ca1a-4f03-9d7f-f9ca678be35c@postgrespro.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 11/25/25 6:01 AM, Tom Lane wrote:
> I wrote:
>> Wouldn't it be better to handle it more like the is_target_rel logic
>> a few lines further up?
> Actually, after thinking a bit longer, it'd be better to do something
> like the attached so that we don't keep redundant quals unless they'd
> *all* be excluded.
>
> There's definitely something fishy about the costing though.
> I experimented with this variant of Sergei's example:
>
> regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
> SELECT 1000
> regression=# ANALYZE hash_partial;
> ANALYZE
> regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
> CREATE INDEX
> regression=# set enable_seqscan TO 0; -- else we'll go for a seqscan
> SET
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Bitmap Heap Scan on hash_partial (cost=24.08..32.56 rows=10 width=4)
> Recheck Cond: (x = 1)
> -> Bitmap Index Scan on partial_idx (cost=0.00..24.07 rows=10 width=0)
> Index Cond: (x = 1)
> (4 rows)
>
> regression=# drop index partial_idx;
> DROP INDEX
> regression=# CREATE INDEX ON hash_partial USING hash(x);
> CREATE INDEX
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
> QUERY PLAN
> ----------------------------------------------------------------------------------
> Bitmap Heap Scan on hash_partial (cost=4.08..12.56 rows=10 width=4)
> Recheck Cond: (x = 1)
> -> Bitmap Index Scan on hash_partial_x_idx (cost=0.00..4.08 rows=10 width=0)
> Index Cond: (x = 1)
> (4 rows)
>
> Why are we thinking that a non-partial index would be substantially
> cheaper to scan? That seems surely wrong, and it runs counter to my
> intuition about why this fix is incomplete. (I expected an unfair
> bias towards the partial index, not against it.)
>
> regards, tom lane
>
Thanks for the fix. It seems there is another case for investigation:
DROP TABLE hash_partial;
CREATE TABLE hash_partial(x, y) AS
SELECT x, x + x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
SET enable_seqscan TO 0;
EXPLAIN SELECT x FROM hash_partial WHERE x = 1 and y < 0;
--------------------------------------------------------------------------------
Seq Scan on hash_partial (cost=0.00..23.00 rows=1 width=4)
Disabled: true
Filter: ((y < 0) AND (x = 1))
(3 rows)
Regarding strangeness of the cost,
cost is depends on numIndexPages and
in genericcostestimate() we calulate numIndexPages:
numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);
For non-partial index index->pages = 6 and index->tuples = 1000
and for partial index index->pages = 6 and index->tuples = 10.
Number of pages depends on index relation size and
initial size is 6 * BLCKSZ for both, partial and non-partial hash indexes
Initial size of the hash index relation, in turn,
depends on total number of tuples in the table.
Regards,
Sergei Glukhov
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Xuneng Zhou | 2025-11-25 11:51:19 | Re: Implement waiting for wal lsn replay: reloaded |
| Previous Message | Zhijie Hou (Fujitsu) | 2025-11-25 11:48:48 | RE: How can end users know the cause of LR slot sync delays? |