RE: Hardware for writing/updating 12,000,000 rows per hour

From: farjad(dot)farid <farjad(dot)farid(at)checknetworks(dot)com>
To: Arya F <arya6000(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ron <ronljohnsonjr(at)gmail(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: RE: Hardware for writing/updating 12,000,000 rows per hour
Date: 2019-07-27 22:06:30
Message-ID: LNXP265MB00593B8414BA4F8A2F05850E92C30@LNXP265MB0059.GBRP265.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

HI Arya,

It is not clear what is the budget and why there is so much data? Is this a real time system, e.g. 24/7 operation. Even if each row takes up just 50 bytes, that is a lot of data in/out of your CPUs/memory/hard disk, any one of which could fail.

Personally I would recommend analyzing the software for any pattern that might help you to reduce the use of hard disk. Push the data as much as possible to memory, then the overflow to hard disk(if at all possible) I know it might be difficult but it could save you a lot of down time/maintenance. Also double check reliability of hard disk vs. SSD. Not all SSDs are server grade or motherboards or memory. Use hardware RAID card, not software based as it degrades the overall performance.

As a minimum I would recommend a RAID configuration bearing in mind the budget. Also check the card manufacturer’s reliability figures. Reliability of all components matter. Short term cost saving, could cost a lot more in this kind of situations. At least made a request. If they reject it and things go wrong you could point out that you had made the request.

What about backing up the data?

Good luck.

From: Arya F <arya6000(at)gmail(dot)com>
Sent: 2019 July 27 17:56
To: farjad.farid <farjad(dot)farid(at)checknetworks(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>; Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; Ron <ronljohnsonjr(at)gmail(dot)com>; pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Hardware for writing/updating 12,000,000 rows per hour

On Sat, Jul 27, 2019 at 11:49 AM farjad.farid <farjad(dot)farid(at)checknetworks(dot)com<mailto:farjad(dot)farid(at)checknetworks(dot)com>> wrote:
With this kind of design requirements it is worth considering hardware "failure & recovery". Even SSDs can and do fail.

It is not just a matter of just speed. RAID disks of some kind, depending on the budget is worth the effort.

-----Original Message-----
From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com<mailto:alvherre(at)2ndquadrant(dot)com>>
Sent: 2019 July 26 22:39
To: Arya F <arya6000(at)gmail(dot)com<mailto:arya6000(at)gmail(dot)com>>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us<mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>>; Ron <ronljohnsonjr(at)gmail(dot)com<mailto:ronljohnsonjr(at)gmail(dot)com>>; pgsql-general(at)lists(dot)postgresql(dot)org<mailto:pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Hardware for writing/updating 12,000,000 rows per hour

On 2019-Jul-26, Arya F wrote:

> I think I can modify my application to do a batch update. Right now
> the server has an HDD and it really can't handle a lot of updates and
> inserts per second. Would changing to a regular SSD be able to easily
> do 3000 updates per second?

That's a pretty hard question in isolation -- you need to consider how many indexes are there to update, whether the updated columns are indexed or not, what the datatypes are, how much locality of access you'll have ... I'm probably missing some other important factors. (Of course, you'll have to tune various PG server settings to find your sweet spot.)

I suggest that should be measuring instead of trying to guess. A reasonably cheap way is to rent a machine somewhere with the type of hardware you think you'll need, and run your workload there for long enough, making sure to carefully observe important metrics such as table size, accumulated bloat, checkpoint regime, overall I/O activity, and so on.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi Farjad
I was thinking of having physical or logical replication. Or is having RAID a must if I don't want to lose data?

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2019-07-27 22:34:50 Re: Why does backend send buffer size hardcoded at 8KB?
Previous Message Andres Freund 2019-07-27 21:08:50 Re: Why does backend send buffer size hardcoded at 8KB?