Need help with optimising simple query

From: Nandakumar M <m(dot)nanda92(at)gmail(dot)com>
To: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Need help with optimising simple query
Date: 2018-07-09 11:47:23
Message-ID: CANcFUu5a9W_MORQa5Tv9vsofN_w_irU9-OBQtN34YmL7p4gP_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

I am having a query that has an order by and a limit clause. The
column on which I am doing order by is indexed (default b tree index).
However the index is not being used. On tweaking the query a bit I
found that when I use left join index is not used whereas when I use
inner join the index is used.

Unfortunately, the behaviour we expect is that of left join only. My
question is, is there any way to modify/improve the query to improve
the query speed or is this the best that is possible for this case.

Please find below a simplified version of the queries. I tried the
queries on 9.3 and 10 versions and both gave similar results.

Table structure

performance_test=# \d+ child
Table "public.child"
Column | Type | Collation | Nullable | Default
| Storage | Stats target | Description
--------+--------+-----------+----------+-----------------------------------+----------+--------------+-------------
id | bigint | | not null |
nextval('child_id_seq'::regclass) | plain | |
name | text | | not null |
| extended | |
Indexes:
"child_pkey" PRIMARY KEY, btree (id)
"child_name_unique" UNIQUE CONSTRAINT, btree (name)
Referenced by:
TABLE "parent" CONSTRAINT "parent_child_id_fkey" FOREIGN KEY
(child_id) REFERENCES child(id)

performance_test=# \d+ parent
Table "public.parent"
Column | Type | Collation | Nullable | Default
| Storage | Stats target | Description
----------+--------+-----------+----------+------------------------------------+----------+--------------+-------------
id | bigint | | not null |
nextval('parent_id_seq'::regclass) | plain | |
name | text | | not null |
| extended | |
child_id | bigint | | |
| plain | |
Indexes:
"parent_pkey" PRIMARY KEY, btree (id)
"parent_name_unique" UNIQUE CONSTRAINT, btree (name)
"parent_child_id_idx" btree (child_id)
Foreign-key constraints:
"parent_child_id_fkey" FOREIGN KEY (child_id) REFERENCES child(id)

Query used to populate data

performance_test=# insert into child(name) select concat('child ',
gen.id) as name from (select generate_series(1,100000) as id) as gen;

performance_test=# insert into parent(name, child_id) select
concat('parent ', gen.id) as name, (id%100000) + 1 from (select
generate_series(1,1000000) as id) as gen;

Left join with order by using child name

performance_test=# explain analyze select * from parent left join
child on parent.child_id = child.id order by child.name limit 10;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=69318.55..69318.58 rows=10 width=59) (actual
time=790.708..790.709 rows=10 loops=1)
-> Sort (cost=69318.55..71818.55 rows=1000000 width=59) (actual
time=790.705..790.706 rows=10 loops=1)
Sort Key: child.name
Sort Method: top-N heapsort Memory: 27kB
-> Hash Left Join (cost=3473.00..47708.91 rows=1000000
width=59) (actual time=51.066..401.028 rows=1000000 loops=1)
Hash Cond: (parent.child_id = child.id)
-> Seq Scan on parent (cost=0.00..17353.00
rows=1000000 width=29) (actual time=0.026..67.848 rows=1000000
loops=1)
-> Hash (cost=1637.00..1637.00 rows=100000 width=19)
(actual time=50.879..50.879 rows=100000 loops=1)
Buckets: 65536 Batches: 2 Memory Usage: 3053kB
-> Seq Scan on child (cost=0.00..1637.00
rows=100000 width=19) (actual time=0.018..17.281 rows=100000 loops=1)
Planning time: 1.191 ms
Execution time: 790.797 ms
(12 rows)

Inner join with sorting according to child name

performance_test=# explain analyze select * from parent inner join
child on parent.child_id = child.id order by child.name limit 10;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.84..2.03 rows=10 width=59) (actual time=0.156..0.193
rows=10 loops=1)
-> Nested Loop (cost=0.84..119132.56 rows=1000000 width=59)
(actual time=0.154..0.186 rows=10 loops=1)
-> Index Scan using child_name_unique on child
(cost=0.42..5448.56 rows=100000 width=19) (actual time=0.126..0.126
rows=1 loops=1)
-> Index Scan using parent_child_id_idx on parent
(cost=0.42..1.04 rows=10 width=29) (actual time=0.019..0.045 rows=10
loops=1)
Index Cond: (child_id = child.id)
Planning time: 0.941 ms
Execution time: 0.283 ms
(7 rows)

Version

performance_test=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit
(1 row)

Any help from Postgres experts would be great.

Thanks,
Nanda

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2018-07-09 14:23:47 Re: Need help with optimising simple query
Previous Message Mariel Cherkassky 2018-07-09 06:18:28 Re: where can I download the binaries of plpython extension