Rearchitecting for storage

From: Matthew Pounsett <matt(at)conundrum(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Rearchitecting for storage
Date: 2019-07-18 13:44:04
Message-ID: CAAiTEH-442wghJmt=Yw2bnUvxNW-nwTMoFLqgLrTZ1CmZxiZJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've recently inherited a database that is dangerously close to outgrowing
the available storage on its existing hardware. I'm looking for (pointers
to) advice on scaling the storage in a financially constrained
not-for-profit.

The current size of the DB's data directory is just shy of 23TB. When I
received the machine it's on, it was configured with 18x3TB drives in
RAID10 (9x 2-drive mirrors striped together) for about 28TB of available
storage. As a short term measure I've reconfigured them into RAID50 (3x
6-drive RAID5 arrays). This is obviously a poor choice for performance,
but it'll get us through until we figure out what to do about
upgrading/replacing the hardware. The host is constrained to 24x3TB
drives, so we can't get much of an upgrade by just adding/replacing disks.

One of my anticipated requirements for any replacement we design is that I
should be able to do upgrades of Postgres for up to five years without
needing major upgrades to the hardware. My understanding of the standard
upgrade process is that this requires that the data directory be smaller
than the free storage (so that there is room to hold two copies of the data
directory simultaneously). I haven't got detailed growth statistics yet,
but given that the DB has grown to 23TB in 5 years, I should assume that it
could double in the next five years, requiring 100TB of available storage
to be able to do updates.

This seems to be right on the cusp of what is possible to fit in a single
chassis with a RAID10 configuration (at least, with commodify hardware),
which means we're looking at pretty high cost:performance ratio. I'd like
to see if we can find designs that get that ratio down a bit, or a lot, but
I'm a general sysadmin, and the detailed effects on those choices are
outside of my limited DBA experience.

Are there good documents out there on sizing hardware for this sort of
mid-range storage requirement, that is neither big data, nor "small data"
able to fit on a single host? I'm hoping for an overview of the tradeoffs
between single head, dual-head setups with a JBOD array, or whatever else
is advisable to consider these days. Corrections of any poor assumptions
exposed above are also quite welcome. :)

Thanks in advance for any assistance!

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dirk Riehle 2019-07-18 14:23:26 PostgreSQL as a Service
Previous Message Luca Ferrari 2019-07-18 10:12:48 Re: Postgers 9.3 - ubuntu 16.04 - Are clogs entires automatically deleted?