Re: Are bitmap index scans slow to start?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Are bitmap index scans slow to start?
Date: 2013-02-27 18:03:50
Message-ID: CAMkU=1zRqJ6Bn08KiN8vLBoVj14W5xXM7aJLgz+QZwLC0ZRbJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Feb 26, 2013 at 4:33 PM, Carlo Stonebanks <
stonec(dot)register(at)sympatico(dot)ca> wrote:

> <<Is each of these write operations just covering a single row? Does this
> description apply to just one of the many (how many?) databases, so that
> there are really 14*N concurrent sessions?****
>
> >>** **
>
> ** **
>
> All writes are single row. All DB’s have exactly the same structure, only
> the content is different. Currently the server is hosting five active DB’s
> – although there 14 DB’s actually on the host, the balance are backups and
> or testing environments.
>

I had thought you were saying that any one ETL procedure into one database
used 14 concurrent threads. But really, each ETL procedure is
single-threaded, and there can be up to 5 (or theoretically up to 14) of
them running at a time into different databases?

> When a feed comes in, it can be anything from dozens to millions of rows,
> and may take minutes or days to run. I had asked that PG bouncer be
> installed in front of the host to act as a traffic cop. Try as I may to
> convince the engineering team that fewer sessions running faster is
> optimal, they say that the 14 concurrent sessions is based on real-world
> experience of what imports the fastest.
>

pgbouncer is more for making connections line up single-file when the line
is moving at a very fast clip, say 0.01 second per turn. If I were trying
to make tasks that can each last for hours or days line up and take turns,
I don't think pgbouncer would be the way to go.

> ****
>
> ** **
>
> << You really need to know whether those reads and writes are
> concentrated in a small region (relative to the amount of your RAM), or
> widely scattered. If you are reading and writing intensively (which you do
> seem to be doing) but only within a compact region, then it should not
> drive other data out of the cache. But, since you do seem to have IO
> problems from cache misses, and you do have a high level of activity, the
> easy conclusion is that you have too little RAM to hold the working size of
> your data.
> >>****
>
> ** **
>
> It won’t be a problem of physical RAM, I believe there is at least 32GB of
> RAM. What constitutes “a compact region”?
>

If you have 14 actively going on simultaneously, I'd say a compact region
would then be about 512 MB.
(32GB/ 14 / margin of safety of 4). Again, assuming that that is the
problem.

> The ETL process takes the feed and distributes it to 85 core tables. I
> have been through many PG configuration cycles with the generous help of
> people in this forum. I think the big problem when getting help has been
> this issue of those offering assistance understanding that the whopping
> majority of the time, the system is performing single row reads and writes.
> The assumption tends to be that the end point of an ETL should just be a
> series of COPY statements, and it should all happen very quickly in classic
> SQL bulk queries.****
>

That is often a reasonable assumption, as ETL does end with L :)

Is the original query you posted part of the transform process, rather than
being the production query you run after the ETL is over?

If so, maybe you need a EL(S)TL process, were you first load the data to
staging table in bulk, and then transform it in bulk rather than one row at
a time.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Carlo Stonebanks 2013-02-27 21:38:29 Re: Are bitmap index scans slow to start?
Previous Message Matt Daw 2013-02-27 17:08:45 Re: Estimation question...