Re: Gather Merge

From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Gather Merge
Date: 2017-03-10 10:32:56
Message-ID: VisenaEmail.2f.17bd9d65d44ba7b9.15ab7c23b58@tc7-visena
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

På fredag 10. mars 2017 kl. 10:34:48, skrev Rushabh Lathia <
rushabh(dot)lathia(at)gmail(dot)com <mailto:rushabh(dot)lathia(at)gmail(dot)com>>:
    On Fri, Mar 10, 2017 at 2:42 PM, Andreas Joseph Krogh <andreas(at)visena(dot)com
<mailto:andreas(at)visena(dot)com>> wrote: På fredag 10. mars 2017 kl. 10:09:22, skrev
Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com <mailto:rushabh(dot)lathia(at)gmail(dot)com>>:
    On Fri, Mar 10, 2017 at 2:33 PM, Andreas Joseph Krogh <andreas(at)visena(dot)com
<mailto:andreas(at)visena(dot)com>> wrote: [...]

The execution-plan seems (unsurprisingly) to depend on data-distribution, so
is there a way I can force a GatherMerge?
 
Not directly. GatherMerge cost is mainly depend on parallel_setup_cost,
parallel_tuple_cost and cpu_operator_cost. May be you can force this
by setting this cost low enough. Or another way to force is by disable the
other plans.
 
What plan you are getting now? You not seeing the below error ?
 
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge

 
I'm seeing the same error, it's just that for reproducing it I'd rather not
copy my whole dataset.
 
Can you share me a schema information, I will try to reproduce at my side?

 
The relevant schema is this:
 
drop table if EXISTS temp_email_address_owner; drop table if EXISTS
temp_email_delivery;drop table if EXISTS temp_email_message; create table
temp_email_message( entity_idBIGSERIAL PRIMARY KEY ); create table
temp_email_delivery( entity_idBIGSERIAL PRIMARY KEY, message_id bigint not null
referencestemp_email_message(entity_id), from_entity_id bigint,
received_timestamptimestamp not null ); create table temp_email_address_owner(
entity_idBIGSERIAL PRIMARY KEY, message_id bigint not null references
temp_email_message(entity_id), recipient_idbigint );
EXPLAIN ANALYSE SELECT em.entity_id FROM temp_email_delivery del JOIN
temp_email_message emON (del.message_id = em.entity_id) WHERE
del.from_entity_id =279519 OR em.entity_id IN ( SELECT ea_owner.message_id FROM
temp_email_address_owner ea_ownerWHERE ea_owner.recipient_id = 279519 ) ORDER BY
del.received_timestampDESC LIMIT 101 OFFSET 0;
 
.. But I'm having a hard time reproducing it.
I've tried to copy data from the relevant tables to the test-tables (temp_*),
adding indexes etc. but Gathre Merge works just fine:
 
│ Limit  (cost=209378.96..209391.05 rows=101 width=16) (actual
time=799.380..799.432 rows=101 loops=1)
                                                                                                      │
│   ->  Gather Merge  (cost=209378.96..262335.79 rows=442285 width=16)
(actual time=799.379..799.420 rows=101 loops=1)
                                                                                      │
│         Workers Planned: 4
                                                                                                                                                                                │
│         Workers Launched: 4
                                                                                                                                                                               │
│         ->  Sort  (cost=208378.90..208655.33 rows=110571 width=16) (actual
time=785.029..785.042 rows=81 loops=5)
                                                                                         │
│               Sort Key: del.received_timestamp DESC
                                                                                                                                                       │
│               Sort Method: quicksort  Memory: 29kB
                                                                                                                                                        │
│               ->  Hash Join  (cost=52036.86..204145.01 rows=110571
width=16) (actual time=400.812..784.907 rows=95 loops=5)
                                                                               │
│                     Hash Cond: (del.message_id = em.entity_id)
                                                                                                                                            │
│                     Join Filter: ((del.from_entity_id = 279519) OR (hashed
SubPlan 1))
                                                                                                                    │
│                     Rows Removed by Join Filter: 176799
                                                                                                                                                   │
│                     ->  Parallel Seq Scan on temp_email_delivery del
 (cost=0.00..142515.18 rows=221118 width=24) (actual time=0.033..211.196
rows=176894 loops=5)                                         │
│                     ->  Hash  (cost=39799.72..39799.72 rows=730772 width=8)
(actual time=368.746..368.746 rows=730772 loops=5)
                                                                            │
│                           Buckets: 1048576  Batches: 2  Memory Usage:
22496kB
                                                                                                                             │
│                           ->  Seq Scan on temp_email_message em
 (cost=0.00..39799.72 rows=730772 width=8) (actual time=0.017..208.116
rows=730772 loops=5)                                                │
│                     SubPlan 1
                                                                                                                                                                             │
│                       ->  Index Scan using
temp_email_address_owner_recipient_id_idx on temp_email_address_owner ea_owner
 (cost=0.43..247.32 rows=68 width=8) (actual time=0.072..0.759 rows=175
loops=5) │
│                             Index Cond: (recipient_id = 279519)
                                                                                                                                           │
│ Planning time: 2.134 ms
                                                                                                                                                                                   │
│ Execution time: 830.313 ms
                                                                                                                                                                                │

 
Can it be that the data-set is created with a PG-version from yesterday,
before Gather Merge was commited, then I just recompiled PG and re-installed
over the old installation without re-initdb'ing? I saw no catversion.h changes
so I assumed this was fine.
 
--
Andreas Joseph Krogh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tels 2017-03-10 10:33:57 Re: Upgrading postmaster's log messages about bind/listen errors
Previous Message Alexander Korotkov 2017-03-10 10:25:21 Re: Page Scan Mode in Hash Index