Interpreting postgres execution plan along with AND/OR precedence

From: "Dirschel, Steve" <steve(dot)dirschel(at)thomsonreuters(dot)com>
To: "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Interpreting postgres execution plan along with AND/OR precedence
Date: 2023-01-18 17:03:21
Message-ID: DM6PR03MB4332C3DCEE3692D67D71DA4EFAC79@DM6PR03MB4332.namprd03.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Table definition:

Table "load.lm_queue"
Column | Type | Collation | Nullable | Default
------------------------+--------------------------------+-----------+----------+-------------------------
guid | character(33) | | not null |
host_name | character varying(40) | | |
priority | numeric(11,0) | | not null |
request_time | timestamp(6) without time zone | | not null |
collection_name | character varying(40) | | not null |
stage_id | numeric(11,0) | | |
source_file | character varying(250) | | |
lm_id | numeric(11,0) | | |
start_time | timestamp(6) without time zone | | |
status_text | character varying(225) | | not null | NULL::character varying
load_data_id | character varying(500) | | |
docs_in_load | numeric(11,0) | | |
client_name | character varying(50) | | |
status_code | numeric(11,0) | | |
email_address | character varying(2000) | | |
hold_flag | character(1) | | |
process_type | character varying(40) | | |
cancel_flag | character(1) | | |
file_type | character varying(6) | | |
lm_data | character varying(4000) | | |
ds_request_time | timestamp(6) without time zone | | |
ds_id | numeric(11,0) | | |
ds_start_time | timestamp(6) without time zone | | |
auto_promote_flag | character(1) | | |
extract_out_file | character varying(250) | | |
last_upd_time | timestamp(6) without time zone | | |
ds_fail_count | numeric(11,0) | | |
cc_collection | character varying(40) | | |
cc_environment | character varying(40) | | |
cc_fail_on_db_mismatch | character(1) | | |
cc_tracking_guid | character varying(33) | | |
cc_numrows | character varying(50) | | |
cc_owner | character varying(30) | | |
cc_password | character varying(30) | | |
parent_guid | character varying(33) | | |
Indexes:
"xpklm_queue" PRIMARY KEY, btree (guid)
"idx_hn_cn_dsid_sc_dst_dfc" btree (host_name, collection_name, ds_id, status_code, ds_start_time, ds_fail_count)
"ix_lm_cc" btree (collection_name, client_name)
"ix_lm_chl" btree (client_name, host_name, lm_id)

Query and plan:

explain
SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, PROCESS_TYPE, PARENT_GUID
FROM LOAD.LM_QUEUE lmq1
WHERE CLIENT_NAME='WLCASES'
AND HOLD_FLAG='Y'
AND HOST_NAME='WLLOADB'
AND STATUS_CODE in (1)
AND NOT EXISTS
(SELECT COLLECTION_NAME
FROM LOAD.LM_QUEUE lmq2
WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME
AND LM_ID <> 0
AND PROCESS_TYPE NOT IN('EXTRACT'))
OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')
AND LM_ID = 0
AND CLIENT_NAME='WLCASES'
AND HOST_NAME= 'WLLOADB' )
ORDER BY PRIORITY DESC, REQUEST_TIME ASC;

Sort (cost=1578.99..1579.00 rows=1 width=120)
Sort Key: lmq1.priority DESC, lmq1.request_time
-> Index Scan using ix_lm_chl on lm_queue lmq1 (cost=0.40..1578.98 rows=1 width=120)
Index Cond: (((client_name)::text = 'WLCASES'::text) AND ((host_name)::text = 'WLLOADB'::text))
Filter: (((hold_flag = 'Y'::bpchar) AND (status_code = '1'::numeric) AND (NOT (alternatives: SubPlan 1 or hashed SubPlan 2))) OR (((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = '0'::numeric)))
SubPlan 1
-> Index Scan using ix_lm_cc on lm_queue lmq2 (cost=0.40..177.93 rows=1 width=0)
Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)
Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
SubPlan 2
-> Seq Scan on lm_queue lmq2_1 (cost=0.00..124999.06 rows=12 width=32)
Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

Question on the plan above:
I understand SubPlan 1 above- it is joining into the NOT EXISTS via the lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME and then applying the other filtering inside the NOT EXISTS. But I don't understand SubPlan 2. Given the filter conditions under SubPlan 2 it is also coming from the NOT EXISTS because that is where LM_ID <> 0 and PROCESS_TYPE <> EXTRACT exist but I don't understand the scenario where this query would need to use SubPlan 2. Would anyone be able to explain under what condition(s) SubPlan 2 would get executed?

I'm trying to understand the precedence of AND/OR operations when everything is not tied together with ()'s. I see the following filtering in the query:

1. CLIENT_NAME='WLCASES'
2. AND HOLD_FLAG='Y'
3. AND HOST_NAME='WLLOADB'
4. AND STATUS_CODE in (1)
5. AND NOT EXISTS
(SELECT COLLECTION_NAME
FROM LOAD.LM_QUEUE lmq2
WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME
AND LM_ID <> 0
AND PROCESS_TYPE NOT IN('EXTRACT'))

1. OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')
AND LM_ID = 0
AND CLIENT_NAME='WLCASES'
AND HOST_NAME= 'WLLOADB' )

All of the conditions in #6 above go together in that OR clause because of the ().

According to this link:

https://www.postgresql.org/docs/14/sql-syntax-lexical.html#SQL-PRECEDENCE

AND has higher precedence over OR. So in my mind that would imply the order of the filters in the WHERE clause shouldn't matter- if there are multiple AND filters and OR filters regardless of the order in the FROM clause the AND's are applied first.

But if I change the order of the filters- moved AND HOLD_FLAG='Y' AND HOST_NAME='WLLOADB' AND STATUS_CODE in (1) to the bottom of the query:

explain
SELECT GUID, COLLECTION_NAME, PRIORITY, START_TIME, REQUEST_TIME, CLIENT_NAME, PROCESS_TYPE, PARENT_GUID
FROM LOAD.LM_QUEUE lmq1
WHERE CLIENT_NAME='WLCASES'
AND NOT EXISTS
(SELECT COLLECTION_NAME
FROM LOAD.LM_QUEUE lmq2
WHERE lmq1.COLLECTION_NAME = lmq2.COLLECTION_NAME
AND LM_ID <> 0
AND PROCESS_TYPE NOT IN('EXTRACT'))
OR (PROCESS_TYPE in ('UNLOCK','UNLOCK RERUN')
AND LM_ID = 0
AND CLIENT_NAME='WLCASES'
AND HOST_NAME= 'WLLOADB' )
AND HOLD_FLAG='Y'
AND HOST_NAME='WLLOADB'
AND STATUS_CODE in (1)
ORDER BY PRIORITY DESC, REQUEST_TIME ASC;

The plan changes:

Sort (cost=9382.94..9382.97 rows=12 width=169)
Sort Key: lmq1.priority DESC, lmq1.request_time
-> Bitmap Heap Scan on lm_queue lmq1 (cost=4572.59..9382.73 rows=12 width=169)
Recheck Cond: ((client_name)::text = 'WLCASES'::text)
Filter: ((NOT (alternatives: SubPlan 1 or hashed SubPlan 2)) OR (((process_type)::text = ANY ('{UNLOCK,"UNLOCK RERUN"}'::text[])) AND (lm_id = '0'::numeric) AND ((host_name)::text = 'WLLOADB'::text) AND (hold_flag = 'Y'::bpchar) AND ((host_name)::text = 'WLLOADB'::text) AND (status_code = '1'::numeric)))
-> Bitmap Index Scan on ix_lm_chl (cost=0.00..4572.58 rows=25 width=0)
Index Cond: ((client_name)::text = 'WLCASES'::text)
SubPlan 1
-> Bitmap Heap Scan on lm_queue lmq2 (cost=164.44..188.42 rows=1 width=0)
Recheck Cond: ((lmq1.collection_name)::text = (collection_name)::text)
Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))
-> Bitmap Index Scan on ix_lm_cc (cost=0.00..164.44 rows=6 width=0)
Index Cond: ((collection_name)::text = (lmq1.collection_name)::text)
SubPlan 2
-> Seq Scan on lm_queue lmq2_1 (cost=0.00..124999.49 rows=25 width=32)
Filter: ((lm_id <> '0'::numeric) AND ((process_type)::text <> 'EXTRACT'::text))

In the original plan above I believe the query drives off index ix_lm_chl applying both client_name = WLCASES and host_name = WLLOADB to the index cond. In the plan directly above I believe it also drives off index ix_lm_chl but it is only applying client_name = WLCASES to the index cond.

If AND's are applied first then why wouldn't the modified query apply both client_name = WLCASES and host_name = WLLOADB to the index cond? Is it because those are moved below the OR condition?

Thanks in advance.
Steve
This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron 2023-01-18 17:26:51 Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?
Previous Message Rob Sargent 2023-01-18 16:54:32 Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?