Quick Links

Bad plan for ltree predicate <@

From:	Roman Konoval <rkonoval(at)gmail(dot)com>
To:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Bad plan for ltree predicate <@
Date:	2017-12-01 17:20:28
Message-ID:	43899CF8-9D4F-4B06-8460-81CC2E0B68E5@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi,

I have a problem on 9.3.14 with a query that accesses table:

Size: (retrieved by query https://gist.github.com/romank0/74f9d1d807bd3f41c0729d0fc6126749)

Definition:
Table "public.document_head"
Column | Type | Modifiers
-----------------------------+--------------------------+-------------------------------------
snapshot_id | character varying(36) | not null
id | character varying(36) | not null
base_type | character varying(50) | not null
is_cascade | boolean | not null default false
parent_id | character varying(36) |
fileplan_node_id | character varying(36) |
state | character varying(10) | default 'ACTIVE'::character varying
title | character varying(4096) | not null
properties | text | not null
properties_cache | hstore | not null
serial_number | integer | not null
major_version | integer | not null
minor_version | integer | not null
version_description | text |
sensitivity | integer | not null default 10
id_path | ltree |
path_name | character varying(4096) | collate C not null
ltx_id | bigint | not null
created_by | integer | not null
created_date | timestamp with time zone | not null
modified_by | integer | not null
modified_date | timestamp with time zone | not null
responsible_user_ids | integer[] |
origin_id | character varying(36) |
origin_snapshot_id | character varying(36) |
ssn | character varying(64) |
record_physical_location | text |
record_physical_location_id | text |
record_created_date | timestamp with time zone |
record_aggregated_date | timestamp with time zone |
record_last_review_comment | text |
record_last_review_date | timestamp with time zone |
record_next_review_date | timestamp with time zone |
record_originated_date | timestamp with time zone |
record_is_vital | boolean | not null default false
storage_plan_state | text | not null default 'New'::text
cut_off_date | timestamp with time zone |
dispose_date | timestamp with time zone |
archive_date | timestamp with time zone |
Indexes:
"document_head__id__uniq_key" PRIMARY KEY, btree (id)
"document_head__parent_id__path_name__unq_idx" UNIQUE, btree (parent_id, path_name) WHERE state::text = 'ACTIVE'::text
"document_head__snapshot_id__unq" UNIQUE, btree (snapshot_id)
"document_head__base_type__idx" btree (base_type) WHERE state::text <> 'DELETED'::text
"document_head__fileplan_node_id__idx" btree (fileplan_node_id)
"document_head__id__idx" btree (id) WHERE state::text <> 'DELETED'::text
"document_head__id_path__btree__idx" btree (id_path) WHERE state::text <> 'DELETED'::text
"document_head__id_path__gist__idx" gist (id_path)
"document_head__ltx_id__idx" btree (ltx_id)
"document_head__origin_id__hotfix__idx" btree (origin_id) WHERE origin_id IS NOT NULL
"document_head__origin_id__idx" btree (origin_id) WHERE state::text <> 'DELETED'::text AND origin_id IS NOT NULL
"document_head__parent_id__idx" btree (parent_id)
"document_head__properties_cache__contact_username_idx" btree ((properties_cache -> 'person_meta_info.username'::text)) WHERE base_type::text = 'Contact'::text AND exist(properties_cache, 'person_meta_info.username'::text)
"document_head__properties_cache__emailmeta_message_id__idx" btree ((properties_cache -> 'emailmeta.message_id'::text)) WHERE base_type::text = 'File'::text AND exist(properties_cache, 'emailmeta.message_id'::text)
"document_head__properties_cache__idx" gist (properties_cache) WHERE state::text <> 'DELETED'::text
"document_head__properties_cache__project_identifier__idx" btree ((properties_cache -> 'project.identifier'::text)) WHERE base_type::text = 'Project'::text AND exist(properties_cache, 'project.identifier'::text)
"document_head__properties_cache__published_origin__idx" btree ((properties_cache -> 'file_published_origin_id.origin_id'::text)) WHERE base_type::text = 'File'::text AND exist(properties_cache, 'file_published_origin_id.origin_id'::text)
"document_head__state__idx" btree (state)
"document_head__storage_plan_state__idx" btree (storage_plan_state) WHERE state::text <> 'DELETED'::text
Check constraints:
"document_base_storage_plan_state_check" CHECK (storage_plan_state = ANY (ARRAY['NEW'::text, 'READY_FOR_CUTOFF'::text, 'CUTOFF'::text, 'READY_FOR_DISPOSITION'::text, 'DISPOSED'::text]))
"document_head__sensitivity_check" CHECK (sensitivity = ANY (ARRAY[10, 20, 30]))
Foreign-key constraints:
"document_head__created_by__fk" FOREIGN KEY (created_by) REFERENCES auth_user(id)
"document_head__modified_by__fk" FOREIGN KEY (modified_by) REFERENCES auth_user(id)
"document_head__parent_id__fk" FOREIGN KEY (parent_id) REFERENCES document(id)

Some notes:
1. properties stores json that for some records may be as large as 300k

select count(*) from document_head where length(properties) > 100000;
count
-------
535
(1 row)

select count(*) from document_head where length(properties) > 20000;
count
-------
13917
(1 row)

select count(*) from document_head where length(properties) > 1000;
count
-------
51708
(1 row)

select count(*) from document_head where length(properties) > 300000;
count
-------
3
(1 row)

select max(length(properties)) from document_head;
max
--------
334976
(1 row)

2. properties_cache stores parsed properties: key is jsonpath of a key in json and value is a value.
3. all results here are retrieved after running `analyze document_head` and `vacuum document_head` manually.
4. I tried different work_mem settings up to 100MB and there's no effect on the main issue described below.
5. I haven't tested disks speed as first of all I think it is irrelevant to the problem and it is not easy to do as this is production system.

The function that is used in the query:

CREATE OR REPLACE FUNCTION public.get_doc_path(document_id character varying)
RETURNS ltree
LANGUAGE plpgsql
STABLE
AS $function$
DECLARE
path ltree;
BEGIN
select id_path into path from document_head where id = document_id;
RETURN path;
END $function$

The original query is rather big one and the simplified version where the issue can still be demonstrated is:

explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
FROM document_head AS d
WHERE (d.id_path <@ get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c')
AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;

I get this plan https://explain.depesz.com/s/UQX4h
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=86227.19..86227.20 rows=1 width=0) (actual time=3878.775..3878.776 rows=1 loops=1)
Buffers: shared hit=747698, temp written=1587
CTE trees
-> Seq Scan on document_head d (cost=0.00..82718.21 rows=155955 width=74) (actual time=0.211..3620.044 rows=154840 loops=1)
Filter: (((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 'DELETED'::text) AND (id_path <@ get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'::character varying)))
Rows Removed by Filter: 23357
Buffers: shared hit=747698
-> CTE Scan on trees (cost=0.00..3119.10 rows=155955 width=0) (actual time=0.215..3828.519 rows=154840 loops=1)
Buffers: shared hit=747698, temp written=1587
Total runtime: 3881.781 ms
(10 rows)

If I change the predicate for ltree to use subquery the plan and execution time changes:

explain (analyze, buffers)
with trees AS (
SELECT d.id, d.snapshot_id , NULL :: text[] AS permissions
FROM document_head AS d
WHERE (d.id_path <@ (select get_doc_path('78157c60-45bc-42c1-9aad-c5651995db5c'))
AND d.id != '78157c60-45bc-42c1-9aad-c5651995db5c') AND d.state != 'DELETED'
)
SELECT COUNT(*) FROM trees;

https://explain.depesz.com/s/eUR
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=763.19..763.20 rows=1 width=0) (actual time=430.519..430.519 rows=1 loops=1)
Buffers: shared hit=47768, temp written=1587
CTE trees
-> Bitmap Heap Scan on document_head d (cost=82.05..759.20 rows=177 width=74) (actual time=70.703..249.419 rows=154840 loops=1)
Recheck Cond: (id_path <@ $0)
Rows Removed by Index Recheck: 11698
Filter: (((id)::text <> '78157c60-45bc-42c1-9aad-c5651995db5c'::text) AND ((state)::text <> 'DELETED'::text))
Rows Removed by Filter: 23
Buffers: shared hit=47768
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.26 rows=1 width=0) (actual time=0.083..0.084 rows=1 loops=1)
Buffers: shared hit=4
-> Bitmap Index Scan on document_head__id_path__gist__idx (cost=0.00..81.74 rows=178 width=0) (actual time=68.326..68.326 rows=159979 loops=1)
Index Cond: (id_path <@ $0)
Buffers: shared hit=16238
-> CTE Scan on trees (cost=0.00..3.54 rows=177 width=0) (actual time=70.707..388.714 rows=154840 loops=1)
Buffers: shared hit=47768, temp written=1587
Total runtime: 433.410 ms
(18 rows)

I can see that:
1. both queries return exactly the same data and require the same underlying data to produce the result.
2. when I use subquery postgres cannot estimate the number of records that are going to be found by <@.
3. number of buffers processed by the quicker query is 15 times smaller than on the slow one. I assume that this is the reason the query is slower.

My understanding how the planner works is that it evaluates some plans and calculates cost based on the settings and and available statistics.
Settings define relative cost of operations like random vs sequential io vs processing in memory data.
I assume that it is better to use queries that allows planner to use correct estimations and tune settings so that planner knows correct operations cost.

My questions:
1. how to find out why slow execution requires 15x more buffers.
2. In the slow executin planner either does not consider the plan similar to the quick on or estimates it as worse. How to find out which of thes cases (if any) is true? And what I can try to make planner use better plan?

I understand that one option to try is to upgrade to more recent version and I'm going to test this but this may take a while until the affected production system will get an upgrade.

More information:

OS: Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-125-generic x86_64)
PG version: PostgreSQL 9.3.14 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit
History: no history as this is a new query.
Hardware: aws m4.large (2 vCPUs, RAM 8GB) with gp2 (SSD) storage with throughtput up to ~20MB/s.

Non default server settings:

Responses

Re: Bad plan for ltree predicate <@ at 2017-12-01 21:33:48 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Vitaliy Garnashevich	2017-12-01 17:40:08	Bitmap scan is undercosted?
Previous Message	Danylo Hlynskyi	2017-12-01 14:52:54	Re: Delete tables difference involves seq scan