Quick Links

Re: Performance issue

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Nagaraj Raj <nagaraj(dot)sf(at)yahoo(dot)com>
Cc:	Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject:	Re: Performance issue
Date:	2020-06-15 00:05:16
Message-ID:	20200615000516.GP14879@telsasoft.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Sun, Jun 14, 2020 at 10:45:52PM +0000, Nagaraj Raj wrote:
> My PostgreSQL server 10.11 running on windows which are running very slow. DB has two tables with ~200Mil records in each. user queries are very slow even explain analyze also taking a longer.
>
> Could you please help me to tune this query and any suggestions to improve system performance?

> CREATE TABLE test1
> (
> individual_entity_proxy_id bigint NOT NULL,
...
> CONSTRAINT pk_i_entity_proxy_id PRIMARY KEY (individual_entity_proxy_id)
>
> );
> CREATE INDEX indx_prospect_indv_entty_id ON test1 USING btree (individual_entity_proxy_id )

This index is redundant with the primary key, which implicitly creates a unique
index.

The table structure seems strange: you have two tables with the same PK column,
which is how they're being joined. It seems like that's better expressed as a
single table with all the columns rather than separate tables (but see below).

> explain analyze select COUNT(*) as "DII_1"
> from ( select distinct table0."individual_entity_proxy_id" as "INDIVIDUAL_ENTITY_PROXY_ID"
> from test1 table0
> inner join test2 table1

I think this may be better written as something like:

| SELECT COUNT(id) FROM t0 WHERE EXISTS (SELECT 1 FROM t1 WHERE t1.id=t0.id AND ...) AND ...

It's guaranteed to be distinct since it's a PK column, so it doesn't need a
"Unique" node.

I think it might prefer an seq scan on t0, which might be good since it seems
to be returning over 10% of records.

> Explain Analyze :
>
> "Aggregate (cost=5345632.91..5345632.92 rows=1 width=8) (actual time=442688.462..442688.462 rows=1 loops=1)"
> " -> Unique (cost=150.13..4943749.39 rows=32150682 width=8) (actual time=0.022..439964.214 rows=32368180 loops=1)"
> " -> Merge Join (cost=150.13..4863372.68 rows=32150682 width=8) (actual time=0.021..435818.276 rows=32368180 loops=1)"
> " Merge Cond: (table0.individual_entity_proxy_id = table1.individual_entity_proxy_id)"
> " -> Index Scan using indx_prospect_indv_entty_id on test1 table0 (cost=0.56..2493461.92 rows=32233405 width=8) (actual time=0.011..63009.551 rows=32368180 loops=1)"
> " Filter: ((direct_mail_preference IS NULL) AND ((last_contacted_dm_datetime IS NULL) OR (last_contacted_dm_datetime < '2020-03-15 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))"
> " Rows Removed by Filter: 7709177"
> " -> Index Scan using pk_entity_proxy_id on test2 table1 (cost=0.56..1867677.94 rows=40071417 width=8) (actual time=0.008..363534.437 rows=40077727 loops=1)"
> " Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND ((cstmr_prspct_ind)::text = 'Prospect'::text))"
> " Rows Removed by Filter: 94756"

It might help to show explain(ANALYZE,BUFFERS).

It looks like test2/table1 index scan is a lot slower than table0.
Maybe table1 gets lots of updates, so isn't clustered on its primary key, so
the index scan is highly random. You could check the "correlation" of its PK
ID column:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

If true, that would be a good reason to have separate tables.

> vCore: 32

Possibly it would be advantageous to use parallel query.
A better query+plan might allow that.

--
Justin

In response to

Performance issue at 2020-06-14 22:45:52 from Nagaraj Raj

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Michael Lewis	2020-06-15 18:21:40	Re: view reading information_schema is slow in PostgreSQL 12
Previous Message	David Rowley	2020-06-14 23:55:30	Re: Performance issue