Re: Problems pushing down WHERE-clause to underlying view

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Problems pushing down WHERE-clause to underlying view
Date: 2019-02-15 18:31:41
Message-ID: 473540ab-5933-04e8-ce55-074f4faaab82@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2/15/19 9:27 AM, Nicklas Avén wrote:
>
> On 2/15/19 5:06 PM, Adrian Klaver wrote:
> > On 2/15/19 7:28 AM, Nicklas Avén wrote:
> >> Hi
> >>
> >> The problem is that it always calculates all those 22000 rows even
> if the user id I use only gives 250 rows.
> >>
> >> So, the query uses 4 seconds instead of under 100 ms.
> >
> > https://www.postgresql.org/docs/10/sql-createview.html
> >
> > "CREATE VIEW defines a view of a query. The view is not physically
> materialized. Instead, the query is run every time the view is
> referenced in a query."
>
> >
>
>
>
> Sorry, I must have expressed what I mean bad. Of course a view is not
> materialized.
> I will explain without views what I mean here below
>
>
>
> > Might want to look at materialized view:
> > https://www.postgresql.org/docs/10/sql-creatematerializedview.html
> >
> > "CREATE MATERIALIZED VIEW is similar to CREATE TABLE AS, except that
> it also remembers the query used to initialize the view, so that it can
> be refreshed later upon demand. A materialized view has many of the same
> properties as a table, but there is no support for temporary
> materialized views or automatic generation of OIDs."
>
> >
>
>
> No, materialized views is not an option. We get some data into those
> tables daily. Recalculating the full dataset on a lot of views like this
> doesn't make sense.
> Instead we have tables maintained with processed new data. But I want
> this last part of logic on top as views for flexibility,
> to not need cached tables for each possible type of grouping that we need.
> Started out with materialized views and it didn't work out well.
>
>
>
>
> >
> > I would also suggest running the EXPLAIN below with ANALYZE so actual
> timings are returned. Also try:
> >
> > SELECT
> >     *
> > FROM
> >     underlying_view AS b
> > JOIN
> >     contractor_access AS b
> > ON
> >     a.machine_key = b.machine_key
> > WHERE
> >     user_id = 'name(at)email(dot)address'
> >>
>
> Sorry again, I didn't mention. This I have tried this since this is what
> the top level view do.
> So first step when trying to understand this was (of course) to apply
> the where-clause directly to the query
>
>
>
> So, let's do that also on the underlying query (view) .
>
>
> Here I have 2 queries, where I apply the where clause directly to the
> query in the underlying view
>
> (joining the contractor_access table directly on that query).
>
> The first takes 30-40 ms ms and returns the same 250 rows as the second.
> In the first I use the machine_key in the where clause.
> In the second query that takes about 16 seconds to return the same 250
> rows I use the user_id in the contractor_access table.
>
> I have also cleaned up the contractor_access table. So there is only 1
> row now, with my email as user_id and the same machine_key as used in
> the first query.
>

I have not had chance to fully go through all of below. Some
questions/suggestions:

1) Thanks for the formatted queries. If I could make a suggestion, when
aliasing could you include AS. It would make finding what l.* refers to
easier for those of us with old eyes:)

2) t4e_contractor_id is in the shiny_adm.contractor_access table?
If not where?

3) What is the schema for shiny_adm.contractor_access?
In particular what indexes are on it?

>
>
> Query 1:
>
>
> EXPLAIN ANALYZE
> SELECT
>     l.machine_key,
>     o.object_name,
>     o.sub_object_name,
>     o.object_user_id,
>     o.sub_object_user_id,
>     o.start_date AS object_start_date,
>     s.species_group_name,
>     p.product_group_name,
>     l.m3_sub AS volume_m3sub,
>     l.number_of_logs,
>     mi.basemachine_manufacturer,
>     mi.basemachine_model
>    FROM ( SELECT hl.contractor_id,
>             hl.machine_key,
>             hl.operator_key,
>             hl.object_key,
>             hl.sub_object_key,
>             date(hl.harvest_date) AS harvest_date,
>             hl.species_group_key,
>             hl.product_key,
>             sum(hl.m3_sub) AS m3_sub,
>             count(*) AS number_of_logs
>            FROM version_union_tables_r02.harvester_logs hl
>           GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
> hl.species_group_key, hl.product_key) l
>      LEFT JOIN version_union_tables_r02.machine_info mi ON
> l.machine_key::text = mi.machine_key::text
>      LEFT JOIN version_union_tables_r02.objects o ON
> l.machine_key::text = o.machine_key::text AND l.object_key =
> o.object_key AND l.sub_object_key = o.sub_object_key
>      LEFT JOIN version_union_tables_r02.products p ON
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>      LEFT JOIN version_union_tables_r02.species s ON
> l.machine_key::text = s.machine_key::text AND l.species_group_key =
> s.species_group_key
> join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
> where l.machine_key = '887655635442600'
> ;
>
> which results in this query plan
>
> Nested Loop  (cost=61865.25..65302.20 rows=22624 width=122) (actual
> time=25.804..27.134 rows=250 loops=1)
>   ->  Seq Scan on contractor_access ci  (cost=0.00..1.01 rows=1
> width=19) (actual time=0.009..0.010 rows=1 loops=1)
>         Filter: (machine_key = '887655635442600'::text)
>   ->  Hash Left Join  (cost=61865.25..65074.95 rows=22624 width=122)
> (actual time=25.793..26.959 rows=250 loops=1)
>         Hash Cond: (((hl.machine_key)::text = (s.machine_key)::text)
> AND (hl.species_group_key = s.species_group_key))
>         ->  Hash Left Join  (cost=61854.55..64263.92 rows=22624
> width=120) (actual time=25.755..26.763 rows=250 loops=1)
>               Hash Cond: (((hl.machine_key)::text =
> (p.machine_key)::text) AND (hl.product_key = p.product_key))
>               ->  Hash Left Join  (cost=61815.97..63145.14 rows=22624
> width=118) (actual time=25.706..26.543 rows=250 loops=1)
>                     Hash Cond: (((hl.machine_key)::text =
> (o.machine_key)::text) AND (hl.object_key = o.object_key) AND
> (hl.sub_object_key = o.sub_object_key))
>                     ->  Hash Left Join  (cost=61799.78..62619.90
> rows=22624 width=65) (actual time=25.668..26.327 rows=250 loops=1)
>                           Hash Cond: ((hl.machine_key)::text =
> (mi.machine_key)::text)
>                           ->  HashAggregate (cost=61796.99..62079.79
> rows=22624 width=69) (actual time=25.627..26.132 rows=250 loops=1)
>                                 Group Key: hl.machine_key,
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
> date(hl.harvest_date), hl.species_group_key, hl.product_key
>                                 ->  Bitmap Heap Scan on harvester_logs
> hl  (cost=570.14..61224.14 rows=22914 width=61) (actual
> time=0.909..11.573 rows=24151 loops=1)
>                                       Recheck Cond:
> ((machine_key)::text = '887655635442600'::text)
>                                       Heap Blocks: exact=538
>                                       ->  Bitmap Index Scan on
> version_union_tables_r02_harvester_logs_machine_key (cost=0.00..564.41
> rows=22914 width=0) (actual time=0.870..0.870 rows=24151 loops=1)
>                                             Index Cond:
> ((machine_key)::text = '887655635442600'::text)
>                           ->  Hash  (cost=2.77..2.77 rows=1 width=38)
> (actual time=0.023..0.023 rows=1 loops=1)
>                                 Buckets: 1024  Batches: 1  Memory
> Usage: 9kB
>                                 ->  Seq Scan on machine_info mi
> (cost=0.00..2.77 rows=1 width=38) (actual time=0.018..0.019 rows=1 loops=1)
>                                       Filter: ((machine_key)::text =
> '887655635442600'::text)
>                                       Rows Removed by Filter: 61
>                     ->  Hash  (cost=16.12..16.12 rows=4 width=84)
> (actual time=0.025..0.026 rows=3 loops=1)
>                           Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                           ->  Bitmap Heap Scan on objects o
> (cost=4.31..16.12 rows=4 width=84) (actual time=0.020..0.021 rows=3
> loops=1)
>                                 Recheck Cond: ((machine_key)::text =
> '887655635442600'::text)
>                                 Heap Blocks: exact=1
>                                 ->  Bitmap Index Scan on
> version_union_tables_r02_objects_machine_key  (cost=0.00..4.31 rows=4
> width=0) (actual time=0.015..0.015 rows=3 loops=1)
>                                       Index Cond: ((machine_key)::text
> = '887655635442600'::text)
>               ->  Hash  (cost=38.19..38.19 rows=26 width=35) (actual
> time=0.037..0.037 rows=26 loops=1)
>                     Buckets: 1024  Batches: 1  Memory Usage: 10kB
>                     ->  Bitmap Heap Scan on products p
> (cost=4.48..38.19 rows=26 width=35) (actual time=0.015..0.026 rows=26
> loops=1)
>                           Recheck Cond: ((machine_key)::text =
> '887655635442600'::text)
>                           Heap Blocks: exact=1
>                           ->  Bitmap Index Scan on
> version_union_tables_r02_products_machine_key  (cost=0.00..4.47 rows=26
> width=0) (actual time=0.009..0.009 rows=26 loops=1)
>                                 Index Cond: ((machine_key)::text =
> '887655635442600'::text)
>         ->  Hash  (cost=10.52..10.52 rows=12 width=37) (actual
> time=0.028..0.029 rows=12 loops=1)
>               Buckets: 1024  Batches: 1  Memory Usage: 9kB
>               ->  Bitmap Heap Scan on species s (cost=4.37..10.52
> rows=12 width=37) (actual time=0.016..0.021 rows=12 loops=1)
>                     Recheck Cond: ((machine_key)::text =
> '887655635442600'::text)
>                     Heap Blocks: exact=1
>                     ->  Bitmap Index Scan on
> version_union_tables_r02_species_machine_key  (cost=0.00..4.36 rows=12
> width=0) (actual time=0.008..0.008 rows=12 loops=1)
>                           Index Cond: ((machine_key)::text =
> '887655635442600'::text)
> Planning time: 0.434 ms
> Execution time: 27.370 ms
>
>
>
> Next query, the slow one that calculates the whole dataset:
>
>
> EXPLAIN ANALYZE
> SELECT
>     l.machine_key,
>     o.object_name,
>     o.sub_object_name,
>     o.object_user_id,
>     o.sub_object_user_id,
>     o.start_date AS object_start_date,
>     s.species_group_name,
>     p.product_group_name,
>     l.m3_sub AS volume_m3sub,
>     l.number_of_logs,
>     mi.basemachine_manufacturer,
>     mi.basemachine_model
>    FROM ( SELECT hl.contractor_id,
>             hl.machine_key,
>             hl.operator_key,
>             hl.object_key,
>             hl.sub_object_key,
>             date(hl.harvest_date) AS harvest_date,
>             hl.species_group_key,
>             hl.product_key,
>             sum(hl.m3_sub) AS m3_sub,
>             count(*) AS number_of_logs
>            FROM version_union_tables_r02.harvester_logs hl
>           GROUP BY hl.machine_key, hl.contractor_id, hl.operator_key,
> hl.object_key, hl.sub_object_key, (date(hl.harvest_date)),
> hl.species_group_key, hl.product_key) l
>      LEFT JOIN version_union_tables_r02.machine_info mi ON
> l.machine_key::text = mi.machine_key::text
>      LEFT JOIN version_union_tables_r02.objects o ON
> l.machine_key::text = o.machine_key::text AND l.object_key =
> o.object_key AND l.sub_object_key = o.sub_object_key
>      LEFT JOIN version_union_tables_r02.products p ON
> l.machine_key::text = p.machine_key::text AND l.product_key = p.product_key
>      LEFT JOIN version_union_tables_r02.species s ON
> l.machine_key::text = s.machine_key::text AND l.species_group_key =
> s.species_group_key
> join shiny_adm.contractor_access ci on l.machine_key=ci.machine_key
> where t4e_contractor_id = 'nicklas(dot)aven(at)jordogskog(dot)no';
>
>
> results in this query plan:
>
> Hash Left Join  (cost=1780026.09..2023556.15 rows=4044 width=122)
> (actual time=15860.900..15888.766 rows=250 loops=1)
>   Hash Cond: (((hl.machine_key)::text = (o.machine_key)::text) AND
> (hl.object_key = o.object_key) AND (hl.sub_object_key = o.sub_object_key))
>   ->  Merge Left Join  (cost=1779946.65..2023340.22 rows=4044 width=69)
> (actual time=15859.604..15887.287 rows=250 loops=1)
>         Merge Cond: ((hl.machine_key)::text = (s.machine_key)::text)
>         Join Filter: (hl.species_group_key = s.species_group_key)
>         Rows Removed by Join Filter: 2750
>         ->  Merge Left Join  (cost=1779915.71..2023136.40 rows=4044
> width=67) (actual time=15859.072..15884.912 rows=250 loops=1)
>               Merge Cond: ((hl.machine_key)::text = (p.machine_key)::text)
>               Join Filter: (hl.product_key = p.product_key)
>               Rows Removed by Join Filter: 6250
>               ->  Merge Left Join  (cost=1779788.20..2022471.20
> rows=4044 width=65) (actual time=15857.473..15879.504 rows=250 loops=1)
>                     Merge Cond: ((hl.machine_key)::text =
> (mi.machine_key)::text)
>                     ->  Merge Join  (cost=1779783.74..2022437.81
> rows=4044 width=48) (actual time=15857.359..15879.102 rows=250 loops=1)
>                           Merge Cond: ((hl.machine_key)::text =
> ci.machine_key)
>                           ->  GroupAggregate
> (cost=1779782.72..2012287.44 rows=808712 width=69) (actual
> time=15088.353..15878.172 rows=2683 loops=1)
>                                 Group Key: hl.machine_key,
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
> (date(hl.harvest_date)), hl.species_group_key, hl.product_key
>                                 ->  Sort (cost=1779782.72..1800000.52
> rows=8087121 width=61) (actual time=15088.336..15488.144 rows=942552
> loops=1)
>                                       Sort Key: hl.machine_key,
> hl.contractor_id, hl.operator_key, hl.object_key, hl.sub_object_key,
> (date(hl.harvest_date)), hl.species_group_key, hl.product_key
>                                       Sort Method: external merge Disk:
> 543456kB
>                                       ->  Seq Scan on harvester_logs
> hl  (cost=0.00..243781.01 rows=8087121 width=61) (actual
> time=0.007..3169.984 rows=8084464 loops=1)
>                           ->  Sort  (cost=1.02..1.03 rows=1 width=19)
> (actual time=0.019..0.020 rows=1 loops=1)
>                                 Sort Key: ci.machine_key
>                                 Sort Method: quicksort  Memory: 25kB
>                                 ->  Seq Scan on contractor_access ci
> (cost=0.00..1.01 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=1)
>                                       Filter: (t4e_contractor_id =
> 'nicklas(dot)aven(at)jordogskog(dot)no'::text)
>                     ->  Sort  (cost=4.47..4.62 rows=62 width=38)
> (actual time=0.102..0.173 rows=266 loops=1)
>                           Sort Key: mi.machine_key
>                           Sort Method: quicksort  Memory: 30kB
>                           ->  Seq Scan on machine_info mi
> (cost=0.00..2.62 rows=62 width=38) (actual time=0.008..0.030 rows=62
> loops=1)
>               ->  Sort  (cost=127.50..131.23 rows=1491 width=35)
> (actual time=1.205..3.071 rows=7204 loops=1)
>                     Sort Key: p.machine_key
>                     Sort Method: quicksort  Memory: 175kB
>                     ->  Seq Scan on products p  (cost=0.00..48.91
> rows=1491 width=35) (actual time=0.004..0.497 rows=1491 loops=1)
>         ->  Sort  (cost=30.94..32.09 rows=460 width=37) (actual
> time=0.385..1.233 rows=3259 loops=1)
>               Sort Key: s.machine_key
>               Sort Method: quicksort  Memory: 65kB
>               ->  Seq Scan on species s  (cost=0.00..10.60 rows=460
> width=37) (actual time=0.004..0.146 rows=460 loops=1)
>   ->  Hash  (cost=49.25..49.25 rows=1725 width=84) (actual
> time=1.286..1.287 rows=1690 loops=1)
>         Buckets: 2048  Batches: 1  Memory Usage: 190kB
>         ->  Seq Scan on objects o  (cost=0.00..49.25 rows=1725
> width=84) (actual time=0.004..0.600 rows=1725 loops=1)
> Planning time: 0.527 ms
> Execution time: 15945.641 ms
>
>
>
>
> Thanks
>
> Nicklas
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeremy Finzel 2019-02-15 19:01:50 Re: Size estimation of postgres core files
Previous Message Andrew Gierth 2019-02-15 18:27:02 Re: Size estimation of postgres core files