Skip site navigation (1) Skip section navigation (2)

Re: How to best use 32 15k.7 300GB drives?

From: Robert Schnabel <schnabelr(at)missouri(dot)edu>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "david(at)lang(dot)hm" <david(at)lang(dot)hm>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: How to best use 32 15k.7 300GB drives?
Date: 2011-01-28 16:39:03
Message-ID: 4D42F127.9050307@missouri.edu (view raw or flat)
Thread:
Lists: pgsql-performance
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    <br>
    On 1/28/2011 7:14 AM, Stephen Frost wrote:
    <blockquote cite="mid:20110128131410(dot)GC30352(at)tamriel(dot)snowman(dot)net"
      type="cite">
      <pre wrap="">Robert,

* Robert Schnabel (<a class="moz-txt-link-abbreviated" href="mailto:schnabelr(at)missouri(dot)edu">schnabelr(at)missouri(dot)edu</a>) wrote:
</pre>
      <blockquote type="cite">
        <pre wrap="">Once the bulk data is inserted into the tables I generally
do some updates on columns to set values which characterize the
data.  
</pre>
      </blockquote>
      <pre wrap="">
Please tell me you're not running actual full-table UPDATE statements...
You would be *much* better off either:
a) munging the data on the way in (if possible/reasonable)
b) loading the data into temp tables first, and then using INSERT
   statements to move the data into the 'final' tables WITH the new
   columns/info you want
c) considering if you can normalize the data into multiple tables and/or
   to cut down the columns to only what you need as you go through the
   above, too

A full-table UPDATE means you're basically making the table twice as big
as it needs to be, and it'll never get smaller..
</pre>
    </blockquote>
    Depends on what you mean by that.&nbsp; The tables that I'm concerned
    with look something like bigint x2, char var x13, int x24, real x8,
    smallint x4 by about 65M rows, each.&nbsp; I only do the updates on one
    table at a time.&nbsp; The real columns are actually null in the input
    csv file.&nbsp; I run an update which basically uses some of the integer
    columns and calculates frequencies which go into the real columns.&nbsp;
    Ditto with some of the other columns.&nbsp; I don't do this before I
    upload the data because 1) it's easier this way and 2) I can't
    because some of the updates involve joins to other tables to grab
    info that I can't do outside the database.&nbsp; So yes, once the upload
    is done I run queries that update every row for certain columns, not
    every column.&nbsp; After I'm done with a table I run a VACUUM ANALYZE.&nbsp;
    I'm really not worried about what my table looks like on disk.&nbsp; I
    actually take other steps also to avoid what you're talking about.<br>
    <br>
    <blockquote cite="mid:20110128131410(dot)GC30352(at)tamriel(dot)snowman(dot)net"
      type="cite">
      <pre wrap=""></pre>
      <blockquote type="cite">
        <pre wrap="">These columns then get indexed.  Basically once the initial
manipulation is done the table is then static and what I'm looking
for is query speed.
</pre>
      </blockquote>
      <pre wrap="">
Sadly, this is the same type of DW needs that I've got (though with
telecomm data and phone calls, not genetic stuffs ;), and PG ends up
being limited by the fact that it can only use one core/thread to go
through the data with.

You might consider investing some time trying to figure out how to
parallelize your queries.  My approach to this has been to partition the
data (probably something you're doing already) into multiple tables and
then have shell/perl scripts which will run a given query against all of
the tables, dumping the results of that aggregation/analysis into other
tables, and then having a final 'merge' query.
</pre>
    </blockquote>
    Thanks for the advise but parallelizing/automating doesn't really do
    anything for me.&nbsp; The data is already partitioned.&nbsp; Think of it this
    way, you just got 65M new records with about 30 data points per
    record on an individual sample.&nbsp; You put it in a new table of it's
    own and now you want to characterize those 65M data points.&nbsp; The
    first update flags about 60M of the rows as uninteresting so you
    move them to their own *uninteresting* table and basically never
    really touch them again (but you cant get rid of them).&nbsp; Now you're
    working with 5M that you're going to characterize into about 20
    categories based on what is in those 30 columns of data.&nbsp; Do all the
    querying/updating then index and you're done.&nbsp; Too long to describe
    but I cannot automate this.&nbsp; I only update one partition at a time
    and only about every couple weeks or so.<br>
    <br>
    <br>
    <blockquote cite="mid:20110128131410(dot)GC30352(at)tamriel(dot)snowman(dot)net"
      type="cite">
      <pre wrap=""></pre>
      <blockquote type="cite">
        <pre wrap="">The data is sorted by snp_number, sample_id.  So if I want the data
for a given sample_id it would be a block of ~58k rows.  The size of
the table depends on how many sample_id's there are.  My largest has
~30k sample_id by 58k snp_number per sample.  The other big table
(with children) is "mutations" and is set up similarly so that I can
access individual tables (samples) based on constraints.  Each of
these children have between 5-60M records.
</pre>
      </blockquote>
      <pre wrap="">
Understand that indexes are only going to be used/useful, typically, if
the amount of records being returned is small relative to the size of
the table (eg: 5%).
</pre>
    </blockquote>
    Yep, I understand that.&nbsp; Even though they occupy a lot of space, I
    keep them around because there are times when I need them.<br>
    <br>
    <br>
    <blockquote cite="mid:20110128131410(dot)GC30352(at)tamriel(dot)snowman(dot)net"
      type="cite">
      <pre wrap="">
</pre>
      <blockquote type="cite">
        <pre wrap="">This is all direct attach storage via SAS2 so I'm guessing it's
probably limited to the single port link between the controller and
the expander.  Again, geneticist here not computer scientist. ;-)
</pre>
      </blockquote>
      <pre wrap="">
That link certainly isn't going to help things..  You might consider how
or if you can improve that.
</pre>
    </blockquote>
    Suggestions???&nbsp; It was previously suggested to split the drives on
    each array across the two controller ports rather than have all the
    data drives on one port which makes sense.&nbsp; Maybe I'm getting my
    terminology wrong here but I'm talking about a single SFF-8088 link
    to each 16 drive enclosure.&nbsp; What about two controllers, one for
    each enclosure?&nbsp; Don't know if I have enough empty slots though.<br>
    <br>
    <blockquote cite="mid:20110128131410(dot)GC30352(at)tamriel(dot)snowman(dot)net"
      type="cite">
      <pre wrap=""></pre>
      <blockquote type="cite">
        <pre wrap="">All of the data could be reloaded.  Basically, once I get the data
into the database and I'm done manipulating it I create a backup
copy/dump which then gets stored at a couple different locations.
</pre>
      </blockquote>
      <pre wrap="">
You might consider turning fsync off while you're doing these massive
data loads..  and make sure that you issue your 'CREATE TABLE' and your
'COPY' statements in the same transaction, and again, I suggest loading
into temporary (CREATE TEMPORARY TABLE) tables first, then doing the
CREATE TABLE/INSERT statement for the 'real' table.  Make sure that you
create *both* your constraints *and* your indexes *after* the table is
populated.

If you turn fsync off, make sure you turn it back on. :)

</pre>
    </blockquote>
    I haven't messed with fsync but maybe I'll try.&nbsp; In general, I
    create my indexes and constraints after I'm done doing all the
    updating I need to do.&nbsp; I made the mistake <b>*once*</b> of copying
    millions of rows into a table that already had indexes.<br>
    <br>
    <blockquote cite="mid:20110128131410(dot)GC30352(at)tamriel(dot)snowman(dot)net"
      type="cite">
      <pre wrap=""></pre>
      <blockquote type="cite">
        <pre wrap="">My goal is to 1) have a fairly robust system so that I don't have to
spend my time rebuilding things and 2) be able to query the data
quickly.  Most of what I do are ad hoc queries.  I have an idea...
"how many X have Y in this set of Z samples" and write the query to
get the answer.  I can wait a couple minutes to get an answer but
waiting an hour is becoming tiresome.
</pre>
      </blockquote>
      <pre wrap="">
Have you done any analysis to see what the bottleneck actually is?  When
you run top, is your PG process constantly in 'D' state, or is it in 'R'
state, or what?  Might help figure some of that out.  Note that
parallelizing the query will help regardless of if it's disk bound or
CPU bound, when you're running on the kind of hardware you're talking
about (lots of spindles, multiple CPUs, etc).

	Thanks,

		Stephen
</pre>
    </blockquote>
    It got lost from the original post but my database (9.0.0) is
    currently on my Windows XP 64-bit workstation in my office on a 16
    drive Seagate 15k.5 RAID5, no comments needed, I know, I'm moving it
    :-).&nbsp; I'm moving it to my server which is Windows Ent Server 2008 R2
    64-bit 8 AMD cores &amp; 32G ram and these new drives/controller. So
    no top or lvm although I do keep an eye on things with Process
    Explorer.&nbsp; Also, I don't have any single query that is a problem.&nbsp; I
    have my canned queries which I run manually to
    update/manipulate/move data around every couple weeks when I get a
    new chunk of data.&nbsp; Other than that my queries are all ad hoc.&nbsp; I'm
    just trying to get opinions on the best way to set up these
    drives/controllers/enclosures for basically large sequential reads
    that quite often use indexes.<br>
    <br>
    So far I'd summarize the consensus as:<br>
    1) putting WAL on a separate array is worthless since I do very
    little writes.&nbsp; What about if I put my temp tablespace on the same
    array with WAL &amp; xlog?&nbsp; I've noticed a lot of the ad hoc queries
    I run create tmp files, sometimes tens of GB.&nbsp; I appreciate the fact
    that managing multiple tablespaces is not as easy as managing one
    but if it helps...<br>
    <br>
    2) Indexes on a separate array may not be all that useful since I'm
    not doing simultaneous reads/writes.<br>
    <br>
    3) Since I can very easily recreate the database in case of
    crash/corruption RAID10 may not be the best option.&nbsp; However, if I
    do go with RAID10 split the drives between the two enclosures (this
    assumes data &amp; index arrays).&nbsp; I've thought about RAID0 but
    quite frankly I really don't like having to rebuild things.&nbsp; At some
    point my time becomes valuable.&nbsp; RAID6 was suggested but rebuilding
    a 9TB RAID6 seems scary slow to me.<br>
    <br>
    I appreciate the comments thus far.<br>
    Bob<br>
    <br>
    <br>
    <br>
    <br>
    <br>
    <br>
  </body>
</html>


Attachment: unknown_filename
Description: text/html (10.5 KB)

In response to

Responses

pgsql-performance by date

Next:From: Scott MarloweDate: 2011-01-28 17:00:18
Subject: Re: How to best use 32 15k.7 300GB drives?
Previous:From: Shaun ThomasDate: 2011-01-28 15:31:10
Subject: Re: FW: Queries becoming slow under heavy load

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group