Re: Problems pushing down WHERE-clause to underlying view

From: Nicklas Avén <nicklas(dot)aven(at)jordogskog(dot)no>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Problems pushing down WHERE-clause to underlying view
Date: 2019-02-15 17:27:58
Message-ID: 9770a1ec-c7be-7527-66d7-0c95b8277508@jordogskog.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 2/15/19 5:06 PM, Adrian Klaver wrote:
> On 2/15/19 7:28 AM, Nicklas Avén wrote:
>> Hi
>>
>> The problem is that it always calculates all those 22000 rows even
if the user id I use only gives 250 rows.
>>
>> So, the query uses 4 seconds instead of under 100 ms.
>
> https://www.postgresql.org/docs/10/sql-createview.html
>
> "CREATE VIEW defines a view of a query. The view is not physically
materialized. Instead, the query is run every time the view is
referenced in a query."

>

Sorry, I must have expressed what I mean bad. Of course a view is not
materialized.
I will explain without views what I mean here below

> Might want to look at materialized view:
> https://www.postgresql.org/docs/10/sql-creatematerializedview.html
>
> "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that
it also remembers the query used to initialize the view, so that it can
be refreshed later upon demand. A materialized view has many of the same
properties as a table, but there is no support for temporary
materialized views or automatic generation of OIDs."

>

No, materialized views is not an option. We get some data into those
tables daily. Recalculating the full dataset on a lot of views like this
doesn't make sense.
Instead we have tables maintained with processed new data. But I want
this last part of logic on top as views for flexibility,
to not need cached tables for each possible type of grouping that we need.
Started out with materialized views and it didn't work out well.

>
> I would also suggest running the EXPLAIN below with ANALYZE so actual
timings are returned. Also try:
>
> SELECT
>     *
> FROM
>     underlying_view AS b
> JOIN
>     contractor_access AS b
> ON
>     a.machine_key = b.machine_key
> WHERE
>     user_id = 'name(at)email(dot)address'
>>

Sorry again, I didn't mention. This I have tried this since this is what
the top level view do.
So first step when trying to understand this was (of course) to apply
the where-clause directly to the query

So, let's do that also on the underlying query (view) .

Here I have 2 queries, where I apply the where clause directly to the
query in the underlying view

(joining the contractor_access table directly on that query).

The first takes 30-40 ms ms and returns the same 250 rows as the second.
In the first I use the machine_key in the where clause.
In the second query that takes about 16 seconds to return the same 250
rows I use the user_id in the contractor_access table.

I have also cleaned up the contractor_access table. So there is only 1
row now, with my email as user_id and the same machine_key as used in
the first query.

Query 1:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM ( SELECT hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
           FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where l.machine_key = '887655635442600'
;

which results in this query plan

Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual
time=25.804..27.134 rows=250 loops=1)
  ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1
width=19) (actual time=0.009..0.010 rows=1 loops=1)
        Filter: (machine_key = '887655635442600'::text)
  ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122)
(actual time=25.793..26.959 rows=250 loops=1)
        Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text)
AND (hl.species_group_key = s.species_group_key))
        ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624
width=120) (actual time=25.755..26.763 rows=250 loops=1)
              Hash Cond: (((hl.machine_key)::text =
(p.machine_key)::text) AND (hl.product_key = p.product_key))
              ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624
width=118) (actual time=25.706..26.543 rows=250 loops=1)
                    Hash Cond: (((hl.machine_key)::text =
(o.machine_key)::text) AND (hl.object_key = o.object_key) AND
(hl.sub_object_key = o.sub_object_key))
                    ->  Hash Left Join  (cost=61799.78..62619.90
rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
                          Hash Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                          ->  HashAggregate (cost=61796.99..62079.79
rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
date(hl.harvest_date), hl.species_group_key, hl.product_key
                                ->  Bitmap Heap Scan on harvester_logs
hl  (cost=570.14..61224.14 rows=22914 width=61) (actual
time=0.909..11.573 rows=24151 loops=1)
                                      Recheck Cond:
((machine_key)::text = '887655635442600'::text)
                                      Heap Blocks: exact=538
                                      ->  Bitmap Index Scan on
version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41
rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
                                            Index Cond:
((machine_key)::text = '887655635442600'::text)
                          ->  Hash  (cost=2.77..2.77 rows=1 width=38)
(actual time=0.023..0.023 rows=1 loops=1)
                                Buckets: 1024  Batches: 1  Memory
Usage: 9kB
                                ->  Seq Scan on machine_info mi
(cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
                                      Filter: ((machine_key)::text =
'887655635442600'::text)
                                      Rows Removed by Filter: 61
                    ->  Hash  (cost=16.12..16.12 rows=4 width=84)
(actual time=0.025..0.026 rows=3 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Bitmap Heap Scan on objects o
(cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3 loops=1)
                                Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                                Heap Blocks: exact=1
                                ->  Bitmap Index Scan on
version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4
width=0) (actual time=0.015..0.015 rows=3 loops=1)
                                      Index Cond: ((machine_key)::text
= '887655635442600'::text)
              ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual
time=0.037..0.037 rows=26 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 10kB
                    ->  Bitmap Heap Scan on products p
(cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26
loops=1)
                          Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                          Heap Blocks: exact=1
                          ->  Bitmap Index Scan on
version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26
width=0) (actual time=0.009..0.009 rows=26 loops=1)
                                Index Cond: ((machine_key)::text =
'887655635442600'::text)
        ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual
time=0.028..0.029 rows=12 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              ->  Bitmap Heap Scan on species s (cost=4.37..10.52
rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
                    Recheck Cond: ((machine_key)::text =
'887655635442600'::text)
                    Heap Blocks: exact=1
                    ->  Bitmap Index Scan on
version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12
width=0) (actual time=0.008..0.008 rows=12 loops=1)
                          Index Cond: ((machine_key)::text =
'887655635442600'::text)
Planning time: 0.434 ms
Execution time: 27.370 ms

Next query, the slow one that calculates the whole dataset:

EXPLAIN ANALYZE
SELECT
    l.machine_key,
    o.object_name,
    o.sub_object_name,
    o.object_user_id,
    o.sub_object_user_id,
    o.start_date AS object_start_date,
    s.species_group_name,
    p.product_group_name,
    l.m3_sub AS volume_m3sub,
    l.number_of_logs,
    mi.basemachine_manufacturer,
    mi.basemachine_model
   FROM ( SELECT hl.contractor_id,
            hl.machine_key,
            hl.operator_key,
            hl.object_key,
            hl.sub_object_key,
            date(hl.harvest_date) AS harvest_date,
            hl.species_group_key,
            hl.product_key,
            sum(hl.m3_sub) AS m3_sub,
            count(*) AS number_of_logs
           FROM version_union_tables_r02.harvester_logs hl
          GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
hl.species_group_key, hl.product_key) l
     LEFT JOIN version_union_tables_r02.machine_info mi ON
l.machine_key::text = mi.machine_key::text
     LEFT JOIN version_union_tables_r02.objects o ON
l.machine_key::text = o.machine_key::text AND l.object_key =
o.object_key AND l.sub_object_key = o.sub_object_key
     LEFT JOIN version_union_tables_r02.products p ON
l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
     LEFT JOIN version_union_tables_r02.species s ON
l.machine_key::text = s.machine_key::text AND l.species_group_key =
s.species_group_key
join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
where t4e_contractor_id = 'nicklas(dot)aven(at)jordogskog(dot)no';

results in this query plan:

Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122)
(actual time=15860.900..15888.766 rows=250 loops=1)
  Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND
(hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
  ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69)
(actual time=15859.604..15887.287 rows=250 loops=1)
        Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
        Join Filter: (hl.species_group_key = s.species_group_key)
        Rows Removed by Join Filter: 2750
        ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044
width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
              Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
              Join Filter: (hl.product_key = p.product_key)
              Rows Removed by Join Filter: 6250
              ->  Merge Left Join  (cost=1779788.20..2022471.20
rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
                    Merge Cond: ((hl.machine_key)::text =
(mi.machine_key)::text)
                    ->  Merge Join  (cost=1779783.74..2022437.81
rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
                          Merge Cond: ((hl.machine_key)::text =
ci.machine_key)
                          ->  GroupAggregate
(cost=1779782.72..2012287.44 rows=808712 width=69) (actual
time=15088.353..15878.172 rows=2683 loops=1)
                                Group Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                ->  Sort (cost=1779782.72..1800000.52
rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552
loops=1)
                                      Sort Key: hl.machine_key,
hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
(date(hl.harvest_date)), hl.species_group_key, hl.product_key
                                      Sort Method: external merge Disk:
543456kB
                                      ->  Seq Scan on harvester_logs
hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual
time=0.007..3169.984 rows=8084464 loops=1)
                          ->  Sort  (cost=1.02..1.03 rows=1 width=19)
(actual time=0.019..0.020 rows=1 loops=1)
                                Sort Key: ci.machine_key
                                Sort Method: quicksort  Memory: 25kB
                                ->  Seq Scan on contractor_access ci 
(cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
                                      Filter: (t4e_contractor_id =
'nicklas(dot)aven(at)jordogskog(dot)no'::text)
                    ->  Sort  (cost=4.47..4.62 rows=62 width=38)
(actual time=0.102..0.173 rows=266 loops=1)
                          Sort Key: mi.machine_key
                          Sort Method: quicksort  Memory: 30kB
                          ->  Seq Scan on machine_info mi
(cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62
loops=1)
              ->  Sort  (cost=127.50..131.23 rows=1491 width=35)
(actual time=1.205..3.071 rows=7204 loops=1)
                    Sort Key: p.machine_key
                    Sort Method: quicksort  Memory: 175kB
                    ->  Seq Scan on products p  (cost=0.00..48.91
rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
        ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual
time=0.385..1.233 rows=3259 loops=1)
              Sort Key: s.machine_key
              Sort Method: quicksort  Memory: 65kB
              ->  Seq Scan on species s  (cost=0.00..10.60 rows=460
width=37) (actual time=0.004..0.146 rows=460 loops=1)
  ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual
time=1.286..1.287 rows=1690 loops=1)
        Buckets: 2048  Batches: 1  Memory Usage: 190kB
        ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725
width=84) (actual time=0.004..0.600 rows=1725 loops=1)
Planning time: 0.527 ms
Execution time: 15945.641 ms

Thanks

Nicklas

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2019-02-15 17:41:16 Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2
Previous Message Philippe Ebersohl 2019-02-15 17:21:14 Using NOTIFY from a java application.