Re: Uber migrated from Postgres to MySQL

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Guyren Howe <guyren(at)gmail(dot)com>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Uber migrated from Postgres to MySQL
Date: 2016-07-27 07:17:58
Message-ID: CAKt_ZfsN83+pgqeHgStFJs4T_zggzYxRdxBaJKa6iy=jc1Utcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Just a few points on reading this.

First, the timeline bugs regarding replication (particularly iirc in the
9.1 days). I remember accidentally corrupting a (fortunately only
demonstration!) database cluster in the process of demonstrating promotion
at least once. Iirc last time I tried to reproduce these problems, they
had been fixed (by 9.3?).

The replication section made me wonder though if they were using the right
replication solution for the job. If you don't want an on-disk copy, don't
use physical replication. This being said there is one serious issue here
that is worth mentioning, which is that since autovacuum on the master has
no knowledge of autovacuum on the slave, it is easy to have longer-running
queries on a slave that have rows they need to see removed by autovacuum
and replication. This can of course be easily fixed (if your query takes
30 sec to run, every 30 sec open a minute-long transaction on the master,
which means that autovacuum can never clean rows that are older than 30
sec) but such is not a very robust solution and may cause more problems
than it is worth (the real solution is going to a logical replication
system where that is a problem). As I usually put it, streaming
replication is for cases where you need to guarantee an exact replica of
everything, while logical replication is where you need a copy of data for
use.

Finally, if I were trying to create something like schemaless, there is one
major limitation of PostgreSQL that is not mentioned here, which is TOAST
overhead. I have seen people try to do things like this and TOAST overhead
can be a real problem in these cases. If your data for a row won't easily
fit in significantly less than a page, then every read of that data and
every write can effectively do an implicit nested loop join. And if you
want to talk about write amplification...... But this is also very well
hidden and not easy to measure unless you know to look for it specifically
so it is possible that they ran into it and didn't know it but I don't have
any knowledge of what they did or tried so I could be totally off base
here. I would say I have seen more than one project run into this and
because explain analyze select * does not detoast....

All of the above being said, there are solutions to all the major
problems. But you have to know about them, where to look, and what to do.
And with higher scale, one very important aspect is that attention to
detail starts to matter a whole lot. I agree that there are some good
points raised but I wonder what the solutions are. There is room for some
improvement in the backend (it would really be nice to instrument and
measure toasting/detoasting overhead in explain analyze) but for a lot of
these I wonder if that is secondary. PostgreSQL is very well optimized
for a certain series of tasks, and one can build well optimized solutions
well outside that. At a certain point (including a certain scale)
therewill be no substitute for a teamof people who really know the db
backend inside and out who can design around limitations and I think that
is true for all databases I have worked with.

On Tue, Jul 26, 2016 at 7:39 PM, Guyren Howe <guyren(at)gmail(dot)com> wrote:

> Honestly, I've never heard of anyone doing that. But it sounds like they
> had good reasons.
>
> https://eng.uber.com/mysql-migration/
>
> Thoughts?
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Anton Ananich 2016-07-27 07:28:45 GIN Indexes: Extensibility
Previous Message Condor 2016-07-27 07:15:07 Re: Uber migrated from Postgres to MySQL