Re: Parallel bitmap heap scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel bitmap heap scan
Date: 2016-11-23 01:54:13
Message-ID: CA+TgmoZ+QiGM9hCJPn1ihwADnbHH1Z5NSW6iiFK1gzR7Xf-4Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 19, 2016 at 11:53 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> I found one defect in v2 patch, that I induced during last rebasing.
> That is fixed in v3.

So, I had a brief look at this tonight. This is not a full review,
but just some things I noticed:

+ * Update snpashot info in heap scan descriptor.

Typo. Also, why should we have a function for this at all? And if we
do have a function for this, why should it have "bm" in the name when
it's stored in heapam.c?

+ * [PARALLEL BITMAP HEAP SCAN ALGORITHM]
+ *
+ * #1. Shared TIDBitmap creation and initialization
+ * a) First worker to see the state as parallel bitmap info as
+ * PBM_INITIAL become leader and set the state to PBM_INPROGRESS
+ * All other workers see the state as PBM_INPROGRESS will wait for
+ * leader to complete the TIDBitmap.
+ *
+ * Leader Worker Processing:
+ * (Leader is responsible for creating shared TIDBitmap and create
+ * shared page and chunk array from TIDBitmap.)
+ * 1) Create TIDBitmap using DHT.
+ * 2) Begin Iterate: convert hash table into shared page and chunk
+ * array.
+ * 3) Restore local TIDBitmap variable information into
+ * ParallelBitmapInfo so that other worker can see those.
+ * 4) set state to PBM_FINISHED.
+ * 5) Wake up other workers.
+ *
+ * Other Worker Processing:
+ * 1) Wait until leader create shared TIDBitmap and shared page
+ * and chunk array.
+ * 2) Attach to shared page table, copy TIDBitmap from
+ * ParallelBitmapInfo to local TIDBitmap, we copy this to local
+ * TIDBitmap so that next level processing can read information
+ * same as in non parallel case and we can avoid extra changes
+ * in code.
+ *
+ * # At this level TIDBitmap is ready and all workers are awake #
+ *
+ * #2. Bitmap processing (Iterate and process the pages).
+ * . In this phase each worker will iterate over page and
chunk array and
+ * select heap pages one by one. If prefetch is enable then there will
+ * be two iterator.
+ * . Since multiple worker are iterating over same page and chunk array
+ * we need to have a shared iterator, so we grab a spin lock and iterate
+ * within a lock.

The formatting of this comment is completely haphazard. "Leader
worker" is not a term that has any meaning. A given backend involved
in a parallel query is either a leader or a worker, not both.

+ /* reset parallel bitmap scan info, if present */
+ if (node->parallel_bitmap)
+ {
+ ParallelBitmapInfo *pbminfo = node->parallel_bitmap;
+
+ pbminfo->state = PBM_INITIAL;
+ pbminfo->tbmiterator.schunkbit = 0;
+ pbminfo->tbmiterator.spageptr = 0;
+ pbminfo->tbmiterator.schunkptr = 0;
+ pbminfo->prefetch_iterator.schunkbit = 0;
+ pbminfo->prefetch_iterator.spageptr = 0;
+ pbminfo->prefetch_iterator.schunkptr = 0;
+ pbminfo->prefetch_pages = 0;
+ pbminfo->prefetch_target = -1;
+ }

This is obviously not going to work in the face of concurrent
activity. When we did Parallel Seq Scan, we didn't make any changes
to the rescan code at all. I think we are assuming that there's no
way to cause a rescan of the driving table of a parallel query; if
that's wrong, we'll need some fix, but this isn't it. I would just
leave this out.

+static bool
+pbms_is_leader(ParallelBitmapInfo *pbminfo)

I think you should see if you can use Thomas Munro's barrier stuff for
this instead.

+ SerializeSnapshot(estate->es_snapshot, pbminfo->phs_snapshot_data);
+
+ shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pbminfo);
+
+ node->parallel_bitmap = pbminfo;
+ snapshot = RestoreSnapshot(pbminfo->phs_snapshot_data);
+
+ heap_bm_update_snapshot(node->ss.ss_currentScanDesc, snapshot);

This doesn't make any sense. You serialize the snapshot from the
estate, then restore it, then shove it into the scan descriptor. But
presumably that's already the snapshot the scan descriptor is using.
The workers need to do this, perhaps, but not the leader!

+ dht_parameters params = {0};

Not PostgreSQL style.

+#define TBM_IS_SHARED(tbm) (tbm)->shared

Seems pointless.

+ bool shared; /* need to build shared tbm if set*/

Style.

+ params.tranche_id = LWLockNewTrancheId();

You absolutely, positively cannot burn through tranche IDs like this.

+ if (tbm->shared_pagetable)
+ dht_detach(tbm->shared_pagetable);

Hmm, would we leak references if we errored out?

@@ -47,7 +47,6 @@ typedef enum

static List *translate_sub_tlist(List *tlist, int relid);

-
/*****************************************************************************
* MISC. PATH UTILITIES
*****************************************************************************/

Useless whitespace change.

@@ -23,7 +23,6 @@
#include "utils/relcache.h"
#include "utils/snapshot.h"

-
/* "options" flag bits for heap_insert */
#define HEAP_INSERT_SKIP_WAL 0x0001
#define HEAP_INSERT_SKIP_FSM 0x0002

Useless whitespace change.

WAIT_EVENT_MQ_RECEIVE,
WAIT_EVENT_MQ_SEND,
WAIT_EVENT_PARALLEL_FINISH,
+ WAIT_EVENT_PARALLEL_BITMAP_SCAN,
WAIT_EVENT_SAFE_SNAPSHOT,
WAIT_EVENT_SYNC_REP

Missing a documentation update.

In general, the amount of change in nodeBitmapHeapScan.c seems larger
than I would have expected. My copy of that file has 655 lines; this
patch adds 544 additional lines. I think/hope that some of that can
be simplified away.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-23 02:39:47 Re: [HACKERS] switching documentation build to XSLT
Previous Message Tom Lane 2016-11-23 01:53:02 Re: macaddr 64 bit (EUI-64) datatype support