Re: How to best use 32 15k.7 300GB drives?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Schnabel <schnabelr(at)missouri(dot)edu>
Cc: "david(at)lang(dot)hm" <david(at)lang(dot)hm>, pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: How to best use 32 15k.7 300GB drives?
Date: 2011-01-28 17:14:11
Message-ID: 20110128171411.GD30352@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Robert,

* Robert Schnabel (schnabelr(at)missouri(dot)edu) wrote:
> Depends on what you mean by that.  The tables that I'm concerned with look
> something like bigint x2, char var x13, int x24, real x8, smallint x4 by
> about 65M rows, each.  I only do the updates on one table at a time.  The
> real columns are actually null in the input csv file.  I run an update
> which basically uses some of the integer columns and calculates
> frequencies which go into the real columns. 

Erm, I'm pretty sure you're still increasing the size of the resulting
tables by quite a bit by doing this process- which will slow down later
queries.

> Ditto with some of the other
> columns.  I don't do this before I upload the data because 1) it's easier
> this way and 2) I can't because some of the updates involve joins to other
> tables to grab info that I can't do outside the database. 

That's fine- just first load the data into temporary tables and then do
INSERT INTO new_table SELECT <your query>;

instead.

> So yes, once
> the upload is done I run queries that update every row for certain
> columns, not every column.  After I'm done with a table I run a VACUUM
> ANALYZE.  I'm really not worried about what my table looks like on disk. 

I thought you wanted it fast..? If not, I'm not sure why you're
bothering to post to this list. What it looks like on disk certainly
impacts how fast it is...

> I actually take other steps also to avoid what you're talking about.

If you really don't feel like changing your process, you could just run
'CLUSTER' on the table, on whatever index you use most frequently, and
PG will rewrite the entire table for you, dropping all the dead rows,
etc. You should then run VACUUM FREEZE on it.

> These columns then get indexed. Basically once the initial
> manipulation is done the table is then static and what I'm looking
> for is query speed.

Yes, I gathered that, making the table smaller on disk will improve
query speed.

> Thanks for the advise but parallelizing/automating doesn't really do
> anything for me.  The data is already partitioned.  Think of it this way,
> you just got 65M new records with about 30 data points per record on an
> individual sample.  You put it in a new table of it's own and now you want
> to characterize those 65M data points.  The first update flags about 60M
> of the rows as uninteresting so you move them to their own *uninteresting*
> table and basically never really touch them again (but you cant get rid of
> them).  Now you're working with 5M that you're going to characterize into
> about 20 categories based on what is in those 30 columns of data.  Do all
> the querying/updating then index and you're done.  Too long to describe
> but I cannot automate this.  I only update one partition at a time and
> only about every couple weeks or so.

I was referring to parallelizing queries *after* the data is all loaded,
etc. I wasn't talking about the queries that you use during the load.

I presume that after the load you run some queries. You can probably
parallelize those queries (most DW queries can be, be ime...).

> That link certainly isn't going to help things.. You might consider how
> or if you can improve that.
>
> Suggestions???  It was previously suggested to split the drives on each
> array across the two controller ports rather than have all the data drives
> on one port which makes sense.  Maybe I'm getting my terminology wrong
> here but I'm talking about a single SFF-8088 link to each 16 drive
> enclosure.  What about two controllers, one for each enclosure?  Don't
> know if I have enough empty slots though.

I don't know that you'd need a second controller (though it probably
wouldn't hurt if you could). If there's only one way to attach the
enclosure, then so be it. The issue is if the enclosures end up
multi-plexing the individual drives into fewer channels than there are
actual drives, hence creating a bottle-neck. You would need different
enclosures to deal with that, if that's the case.

> I haven't messed with fsync but maybe I'll try.  In general, I create my
> indexes and constraints after I'm done doing all the updating I need to
> do.  I made the mistake *once* of copying millions of rows into a table
> that already had indexes.

Yeah, I bet that took a while. As I said above, if you don't want to
change your process (which, tbh, I think would be faster if you were
doing INSERTs into a new table than full-table UPDATEs...), then you
should do a CLUSTER after you've created whatever is the most popular
INDEX, and then create your other indexes after that.

> It got lost from the original post but my database (9.0.0) is currently on
> my Windows XP 64-bit workstation in my office on a 16 drive Seagate 15k.5
> RAID5, no comments needed, I know, I'm moving it :-).  I'm moving it to my
> server which is Windows Ent Server 2008 R2 64-bit 8 AMD cores & 32G ram
> and these new drives/controller.

Ughh... No chance to get a Unix-based system (Linux, BSD, whatever) on
there instead? I really don't think Windows Server is going to help
your situation one bit.. :(

> 1) putting WAL on a separate array is worthless since I do very little
> writes.  What about if I put my temp tablespace on the same array with WAL
> & xlog?  I've noticed a lot of the ad hoc queries I run create tmp files,
> sometimes tens of GB.  I appreciate the fact that managing multiple
> tablespaces is not as easy as managing one but if it helps...

That's not a bad idea but I'm not sure it'd make as much difference as
you think it would.. What would be better would be to *avoid*, at all
cost, letting it spill out to on-disk for queries. The way to do that
is to make sure your work_mem is as high as PG will actually use (1GB),
and then to *parallelize* those queries using multiple PG connections,
so that each one will be able to use up that much memory.

For example, say you need to summarize the values for each of your
strands (or whatever) across 5 different "loads". Your query might
look like:

select load,strand,sum(value) from parent_table group by load,strand;

Ideally, PG will use a hash table, key'd on load+strand, to store the
resulting summations in. If it doesn't think the hash table will fit in
work_mem, it's going to SORT ALL OF YOUR DATA ON DISK first instead, and
then WALK THROUGH IT, sum'ing each section, then spitting out the result
to the client, and moving on. This is *not* a fast process. If doing
the same query on an individual child will use a hash table, then it'd
be hugely faster to query each load first, storing the results into
temporary tables. What would be even *faster* would be the run all 5 of
those queries against the child tables in parallel (given that you have
over 5 CPUs and enough memory that you don't start swapping).

If it's still too big on the per-child basis, you might be able to use
conditionals to do the first 100 strands, then the next hundred, etc.

> I appreciate the comments thus far.

Let's hope you'll always appreciate them. :)

Thanks,

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2011-01-28 17:28:10 Re: How to best use 32 15k.7 300GB drives?
Previous Message Robert Schnabel 2011-01-28 17:09:53 Re: How to best use 32 15k.7 300GB drives?