Re: DB design advice: lots of small tables?

From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: DB design advice: lots of small tables?
Date: 2013-03-16 06:30:13
Message-ID: ki13hl$c05$1@gonzo.reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2013-03-15, lender <crlender(at)gmail(dot)com> wrote:
> Hello.
>
> We are currently redesigning a medium/large office management web
> application. There are 75 tables in our existing PostgreSQL database,
> but that number is artificially low, due to some unfortunate design choices.
>
> The main culprits are two tables named "catalog" and "catalog_entries".
> They contain all those data sets that the previous designer deemed too
> small for a separate table, so now they are all stored together. The
> values in catalog_entries are typically used to populate dropdown select
> fields.

> So, my first main question would be: is it "normal" or desirable to have
> that many tiny tables? And is it a problem that many of the tables have
> the same (or a similar) column definitions?

Dunno about "normal", but certainly "Normal" (as in "-form").
No problem.

> The second point is that we have redundant unique identifiers in
> catalog_entries (id and code). The code value is used by the application
> whenever we need to find to one of the values. For example, for a query
> like "show all open invoices", we would either -
>
> 1) select the id from catalog_entries where catalog_id refers to the
> "invoice_status" catalog and the code is "open"
> 2) use that id to filter select * from invoices
>
> - or do the same in one query using joins. This pattern occurs hundreds
> of times in the application code. From a programming viewpoint, having
> all-text ids would make things a lot simpler and cleaner (i.e., keep
> only the "code" column).
>
> The "id" column was used (AFAIK) to reduce the storage size. Most of the
> data tables have less than 100k records, so the overhead wouldn't be too
> dramatic, but a few tables (~10) have more; one of them has 1.2m
> records. These tables can also refer to the old catalog_entries table
> from more than one column. Changing all these references from INT to
> VARCHAR would increase the DB size, and probably make scans less
> performant. I'm not sure know how indexes on these columns would be
> affected.
>
> To summarize, the second question is whether we should ditch the
> artificial numeric IDs and just use the "code" column as primary key in
> the new tiny tables.

I if they aren't hurting you keep them.

> Thanks in advance for your advice.

If you're worried about clutter It may make sense to put all the small tables
in a separate schema.

--
⚂⚃ 100% natural

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Alexeev 2013-03-16 08:33:12 Re: Addled index
Previous Message Jasen Betts 2013-03-16 06:16:33 Re: C++Builder table exist