Re: AWS forcing PG upgrade from v9.6 a disaster

From: "Dean Gibson (DB Administrator)" <postgresql(at)mailpen(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: AWS forcing PG upgrade from v9.6 a disaster
Date: 2021-05-31 03:07:29
Message-ID: 92cc8563-195f-8727-d887-5f2bb63f231f@mailpen.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

*SOLVED !!!*  Below is the *new* EXPLAIN ANALYZE for *13.2* on AWS RDS
(with *no changes* to server parameters) along with the prior EXPLAIN
ANALYZE outputs for easy comparison.

While I didn't discount the significance & effect of optimizing the
server parameters, this problem always seemed to me like a fundamental
difference in how the PostgreSQL planner viewed the structure of the
query.  In particular, I had a usage pattern of writing VIEWS that
worked very well with v9.6 & prior versions, but which made me suspect a
route of attack:

Since the FCC tables contain lots of one-character codes for different
conditions, to simplify maintenance & displays to humans, I created over
twenty tiny lookup tables (a dozen or so entries in each table), to
render a human-readable field as a replacement for the original
one-character field in many of the VIEWs.  In some cases those
"humanized" fields were used as conditions in SELECT statements.  Of
course, fields that are not referenced or selected for output from a
particular query, never get looked up (an advantage over using a JOIN
for each lookup). In some cases, for ease of handling multiple or
complex lookups, I indeed used a JOIN.  All this worked fine until v10.

Here's the FROM clause that bit me:

   FROM lic_en
     JOIN govt_region   USING (territory_id, country_id)
     LEFT JOIN zip_code USING (territory_id, country_id, zip5)
     LEFT JOIN "County" USING (territory_id, country_id, fips_county);

The first two JOINs are not the problem, & are in fact retained in my
solution.  The problem is the third JOIN, where "fips_county" from
"County" is actually matched with the corresponding field from the
"zip_code" VIEW.  Works fine, if you don't mind the performance impact
in v10 & above.  It has now been rewritten, to be a sub-query for an
output field.  Voila ! Back to sub-second query times.

This also solved performance issues with other queries as well.  I also
now use lookup values as additional fields in the output, in addition to
the original fields, which should help some more (but means some changes
to some web pages that do queries).

-- Dean

ps: I wonder how many other RDS users of v9.6 are going to get a very
rude awakening *very soon*, as AWS is not allowing new instances of v9.6
after *August 2* (see https://forums.aws.amazon.com/ann.jspa?annID=8499
).  Whether that milestone affects restores from snapshots, remains to
be seen (by others, not by me).  In other words, users should plan to be
up & running on a newer version well before August.  Total cost to me? 
I"m in my *8th day* of dealing with this, & I still have a number of web
pages to update, due to changes in SQL field names to manage this mess. 
This was certainly not a obvious solution.

*Here's from 13.2 (new):*

=> EXPLAIN ANALYZE  SELECT club_count, extra_count, region_count,
callsign AS trustee_callsign, applicant_type, entity_name, licensee_id
AS _lid FROM genclub_multi_  WHERE club_count >= 5 ORDER BY extra_count
DESC, club_count DESC, entity_name;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=457.77..457.77 rows=1 width=64) (actual
time=48.737..48.742 rows=43 loops=1)
   Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC,
"_EN".entity_name
   Sort Method: quicksort  Memory: 31kB
   ->  Nested Loop Left Join  (cost=1.57..457.76 rows=1 width=64)
(actual time=1.796..48.635 rows=43 loops=1)
         ->  Nested Loop  (cost=1.28..457.07 rows=1 width=71) (actual
time=1.736..48.239 rows=43 loops=1)
               Join Filter: (("_EN".country_id =
"_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
               Rows Removed by Join Filter: 1297
               ->  Nested Loop  (cost=1.28..453.75 rows=1 width=70)
(actual time=1.720..47.778 rows=43 loops=1)
                     Join Filter: (("_HD".unique_system_identifier =
"_EN".unique_system_identifier) AND ("_HD".callsign = "_EN".callsign))
                     ->  Nested Loop  (cost=0.85..450.98 rows=1
width=65) (actual time=1.207..34.912 rows=43 loops=1)
                           ->  Nested Loop (cost=0.43..376.57 rows=27
width=50) (actual time=0.620..20.956 rows=43 loops=1)
                                 ->  Seq Scan on "_Club"
(cost=0.00..4.44 rows=44 width=35) (actual time=0.037..0.067 rows=44
loops=1)
                                       Filter: (club_count >= 5)
                                       Rows Removed by Filter: 151
                                 ->  Index Scan using "_HD_callsign" on
"_HD"  (cost=0.43..8.45 rows=1 width=15) (actual time=0.474..0.474
rows=1 loops=44)
                                       Index Cond: (callsign =
"_Club".trustee_callsign)
                                       Filter: (license_status =
'A'::bpchar)
                                       Rows Removed by Filter: 0
                           ->  Index Scan using "_AM_pkey" on "_AM" 
(cost=0.43..2.75 rows=1 width=15) (actual time=0.323..0.323 rows=1 loops=43)
                                 Index Cond: (unique_system_identifier
= "_HD".unique_system_identifier)
                                 Filter: ("_HD".callsign = callsign)
                     ->  Index Scan using "_EN_pkey" on "_EN" 
(cost=0.43..2.75 rows=1 width=60) (actual time=0.298..0.298 rows=1 loops=43)
                           Index Cond: (unique_system_identifier =
"_AM".unique_system_identifier)
                           Filter: ("_AM".callsign = callsign)
               ->  Seq Scan on "_GovtRegion" (cost=0.00..1.93 rows=93
width=7) (actual time=0.002..0.004 rows=31 loops=43)
         ->  Nested Loop  (cost=0.29..0.68 rows=1 width=7) (actual
time=0.008..0.008 rows=1 loops=43)
               ->  Index Only Scan using "_IsoCountry_iso_alpha2_key"
on "_IsoCountry"  (cost=0.14..0.38 rows=1 width=3) (actual
time=0.004..0.004 rows=1 loops=43)
                     Index Cond: (iso_alpha2 = "_GovtRegion".country_id)
                     Heap Fetches: 43
               ->  Index Only Scan using "_Territory_pkey" on
"_Territory"  (cost=0.14..0.29 rows=1 width=7) (actual time=0.003..0.003
rows=1 loops=43)
                     Index Cond: ((country_id =
"_IsoCountry".iso_alpha2) AND (territory_id = "_GovtRegion".territory_id))
                     Heap Fetches: 43
 Planning Time: 4.017 ms
 Execution Time: 48.822 ms

On 2021-05-28 11:48, Dean Gibson (DB Administrator) wrote:
> ...
>
> *Here's from v9.6:*
>
> => EXPLAIN ANALYZE SELECT club_count, extra_count, region_count,
> callsign AS trustee_callsign, applicant_type, entity_name, licensee_id
> AS _lid FROM genclub_multi_  WHERE club_count >= 5 ORDER BY
> extra_count DESC, club_count DESC, entity_name;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=407.13..407.13 rows=1 width=94) (actual
> time=348.850..348.859 rows=43 loops=1)
>    Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC,
> "_EN".entity_name
>    Sort Method: quicksort  Memory: 31kB
>    ->  Nested Loop  (cost=4.90..407.12 rows=1 width=94) (actual
> time=7.587..348.732 rows=43 loops=1)
>          ->  Nested Loop  (cost=4.47..394.66 rows=1 width=94) (actual
> time=5.740..248.149 rows=43 loops=1)
>                ->  Nested Loop Left Join  (cost=4.04..382.20 rows=1
> width=79) (actual time=2.458..107.908 rows=55 loops=1)
>                      ->  Hash Join  (cost=3.75..380.26 rows=1
> width=86) (actual time=2.398..106.990 rows=55 loops=1)
>                            Hash Cond: (("_EN".country_id =
> "_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
>                            ->  Nested Loop (cost=0.43..376.46 rows=47
> width=94) (actual time=2.294..106.736 rows=55 loops=1)
>                                  ->  Seq Scan on "_Club"
> (cost=0.00..4.44 rows=44 width=35) (actual time=0.024..0.101 rows=44
> loops=1)
>                                        Filter: (club_count >= 5)
>                                        Rows Removed by Filter: 151
>                                  ->  Index Scan using "_EN_callsign"
> on "_EN"  (cost=0.43..8.45 rows=1 width=69) (actual time=2.179..2.420
> rows=1 loops=44)
>                                        Index Cond: (callsign =
> "_Club".trustee_callsign)
>                            ->  Hash  (cost=1.93..1.93 rows=93 width=7)
> (actual time=0.071..0.071 rows=88 loops=1)
>                                  Buckets: 1024  Batches: 1 Memory
> Usage: 12kB
>                                  ->  Seq Scan on "_GovtRegion" 
> (cost=0.00..1.93 rows=93 width=7) (actual time=0.010..0.034 rows=93
> loops=1)
>                      ->  Nested Loop  (cost=0.29..1.93 rows=1 width=7)
> (actual time=0.012..0.014 rows=1 loops=55)
>                            Join Filter: ("_IsoCountry".iso_alpha2 =
> "_Territory".country_id)
>                            Rows Removed by Join Filter: 0
>                            ->  Index Only Scan using
> "_IsoCountry_iso_alpha2_key" on "_IsoCountry"  (cost=0.14..1.62 rows=1
> width=3) (actual time=0.006..0.006 rows=1 loops=55)
>                                  Index Cond: (iso_alpha2 =
> "_GovtRegion".country_id)
>                                  Heap Fetches: 55
>                            ->  Index Only Scan using "_Territory_pkey"
> on "_Territory"  (cost=0.14..0.29 rows=1 width=7)
>  (actual time=0.004..0.005 rows=1 loops=55)
>                                  Index Cond: (territory_id =
> "_GovtRegion".territory_id)
>                                  Heap Fetches: 59
>                ->  Index Scan using "_HD_pkey" on "_HD"
> (cost=0.43..12.45 rows=1 width=15) (actual time=2.548..2.548 rows=1
> loops=55)
>                      Index Cond: (unique_system_identifier =
> "_EN".unique_system_identifier)
>                      Filter: (("_EN".callsign = callsign) AND
> (((((license_status)::text || ' - '::text) || (COALESCE((SubPlan 2),
> '???'::character varying))::text))::character(1) = 'A'::bpchar))
>                      Rows Removed by Filter: 0
>                      SubPlan 2
>                        ->  Limit  (cost=0.15..8.17 rows=1 width=32)
> (actual time=0.006..0.007 rows=1 loops=55)
>                              ->  Index Scan using "_LicStatus_pkey" on
> "_LicStatus"  (cost=0.15..8.17 rows=1 width=32) (actual
> time=0.005..0.005 rows=1 loops=55)
>                                    Index Cond: ("_HD".license_status =
> status_id)
>          ->  Index Scan using "_AM_pkey" on "_AM" (cost=0.43..4.27
> rows=1 width=15) (actual time=2.325..2.325 rows=1 loops=43)
>                Index Cond: (unique_system_identifier =
> "_EN".unique_system_identifier)
>                Filter: ("_EN".callsign = callsign)
>          SubPlan 1
>            ->  Limit  (cost=0.15..8.17 rows=1 width=32) (actual
> time=0.007..0.007 rows=1 loops=43)
>                  ->  Index Scan using "_ApplicantType_pkey" on
> "_ApplicantType"  (cost=0.15..8.17 rows=1 width=32) (actual
> time=0.005..0.005 rows=1 loops=43)
>                        Index Cond: ("_EN".applicant_type_code =
> app_type_id)
>  Planning time: 13.490 ms
>  Execution time: 349.182 ms
> (43 rows)
>
>
> *Here's from v13.2:*
>
> => EXPLAIN ANALYZE SELECT club_count, extra_count, region_count,
> callsign AS trustee_callsign, applicant_type, entity_name, licensee_id
> AS _lid FROM genclub_multi_  WHERE club_count >= 5 ORDER BY
> extra_count DESC, club_count DESC, entity_name;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=144365.60..144365.60 rows=1 width=94) (actual
> time=31898.860..31901.922 rows=43 loops=1)
>    Sort Key: "_Club".extra_count DESC, "_Club".club_count DESC,
> "_EN".entity_name
>    Sort Method: quicksort  Memory: 31kB
>    ->  Nested Loop  (cost=58055.66..144365.59 rows=1 width=94) (actual
> time=6132.403..31894.233 rows=43 loops=1)
>          ->  Nested Loop  (cost=58055.51..144364.21 rows=1 width=62)
> (actual time=1226.085..30337.921 rows=837792 loops=1)
>                ->  Nested Loop Left Join (cost=58055.09..144360.38
> rows=1 width=59) (actual time=1062.414..12471.456 rows=1487153 loops=1)
>                      ->  Hash Join  (cost=58054.80..144359.69 rows=1
> width=66) (actual time=1061.330..6635.041 rows=1487153 loops=1)
>                            Hash Cond: (("_EN".unique_system_identifier
> = "_AM".unique_system_identifier) AND ("_EN".callsign = "_AM".callsign))
>                            ->  Hash Join (cost=3.33..53349.72
> rows=1033046 width=51) (actual time=2.151..3433.178 rows=1487153 loops=1)
>                                  Hash Cond: (("_EN".country_id =
> "_GovtRegion".country_id) AND ("_EN".state = "_GovtRegion".territory_id))
>                                  ->  Seq Scan on "_EN"
> (cost=0.00..45288.05 rows=1509005 width=60) (actual
> time=0.037..2737.054 rows=1508736 loops=1)
>                                  ->  Hash  (cost=1.93..1.93 rows=93
> width=7) (actual time=0.706..1.264 rows=88 loops=1)
>                                        Buckets: 1024  Batches: 1 
> Memory Usage: 12kB
>                                        ->  Seq Scan on "_GovtRegion" 
> (cost=0.00..1.93 rows=93 width=7) (actual time=0.013..0.577 rows=93
> loops=1)
>                            ->  Hash  (cost=28093.99..28093.99
> rows=1506699 width=15) (actual time=1055.587..1055.588 rows=1506474
> loops=1)
>                                  Buckets: 131072  Batches: 32 Memory
> Usage: 3175kB
>                                  ->  Seq Scan on "_AM"
> (cost=0.00..28093.99 rows=1506699 width=15) (actual
> time=0.009..742.774 rows=1506474 loops=1)
>                      ->  Nested Loop  (cost=0.29..0.68 rows=1 width=7)
> (actual time=0.003..0.004 rows=1 loops=1487153)
>                            Join Filter: ("_IsoCountry".iso_alpha2 =
> "_Territory".country_id)
>                            Rows Removed by Join Filter: 0
>                            ->  Index Only Scan using
> "_IsoCountry_iso_alpha2_key" on "_IsoCountry"  (cost=0.14..0.38 rows=1
> width=3) (actual time=0.001..0.002 rows=1 loops=1487153)
>                                  Index Cond: (iso_alpha2 =
> "_GovtRegion".country_id)
>                                  Heap Fetches: 1487153
>                            ->  Index Only Scan using "_Territory_pkey"
> on "_Territory"  (cost=0.14..0.29 rows=1 width=7) (actual
> time=0.001..0.001 rows=1 loops=1487153)
>                                  Index Cond: (territory_id =
> "_GovtRegion".territory_id)
>                                  Heap Fetches: 1550706
>                ->  Index Scan using "_HD_pkey" on "_HD"
> (cost=0.43..3.82 rows=1 width=15) (actual time=0.012..0.012 rows=1
> loops=1487153)
>                      Index Cond: (unique_system_identifier =
> "_EN".unique_system_identifier)
>                      Filter: (("_EN".callsign = callsign) AND
> (((((license_status)::text || ' - '::text) || (COALESCE((SubPlan 2),
> '???'::character varying))::text))::character(1) = 'A'::bpchar))
>                      Rows Removed by Filter: 0
>                      SubPlan 2
>                        ->  Limit  (cost=0.00..1.07 rows=1 width=13)
> (actual time=0.001..0.001 rows=1 loops=1487153)
>                              ->  Seq Scan on "_LicStatus"
> (cost=0.00..1.07 rows=1 width=13) (actual time=0.000..0.000 rows=1
> loops=1487153)
>                                    Filter: ("_HD".license_status =
> status_id)
>                                    Rows Removed by Filter: 1
>          ->  Index Scan using "_Club_pkey" on "_Club" (cost=0.14..0.17
> rows=1 width=35) (actual time=0.002..0.002 rows=0 loops=837792)
>                Index Cond: (trustee_callsign = "_EN".callsign)
>                Filter: (club_count >= 5)
>                Rows Removed by Filter: 0
>          SubPlan 1
>            ->  Limit  (cost=0.00..1.20 rows=1 width=15) (actual
> time=0.060..0.060 rows=1 loops=43)
>                  ->  Seq Scan on "_ApplicantType" (cost=0.00..1.20
> rows=1 width=15) (actual time=0.016..0.016 rows=1 loops=43)
>                        Filter: ("_EN".applicant_type_code = app_type_id)
>                        Rows Removed by Filter: 7
>  Planning Time: 173.753 ms
>  Execution Time: 31919.601 ms
> (46 rows)
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ajin Cherian 2021-05-31 03:09:28 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Tatsuro Yamada 2021-05-31 02:52:05 Re: Race condition in recovery?

Browse pgsql-performance by date

  From Date Subject
Next Message Christophe Pettus 2021-05-31 03:41:28 Re: AWS forcing PG upgrade from v9.6 a disaster
Previous Message Dean Gibson (DB Administrator) 2021-05-29 22:47:39 Re: AWS forcing PG upgrade from v9.6 a disaster