Re: Cluster seems broken after pg_basebackup

From: Guillaume Drolet <droletguillaume(at)gmail(dot)com>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>, pgsql-general(at)postgresql(dot)org
Subject: Re: Cluster seems broken after pg_basebackup
Date: 2015-02-09 16:37:30
Message-ID: CAOkiyv6ctLdwk7B8AL2npv28TWR=3=siqWvunBZ5Ustv5+pOCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2015-02-07 1:24 GMT-05:00 Guillaume Lelarge <guillaume(at)lelarge(dot)info>:

> Le 6 févr. 2015 17:31, "Adrian Klaver" <adrian(dot)klaver(at)aklaver(dot)com> a
> écrit :
> >
> > On 02/06/2015 05:03 AM, Guillaume Drolet wrote:
> >>
> >> Hi,
> >>
> >> Yesterday I ran a pg_basebackup of my cluster. Since it has completed,
> >> my cluster doesn't work properly. I tried restarting the computer (or
> >> service) a few times but I always get the same messages in my logs (it's
> >> in French. If someone is willing to help me I can try to translate the
> >> logs. Just ask):
> >
> >
> > Enter Google Translate:)
> >
>
> But first, Guillaume, do yourself and everyone else a favor: turn the dam
> log into English. Set lc_messages to 'C' in postgresql.conf.
>
Thanks for this! I didn't know about this great feature.

> > First some questions:
> >
> > 1) What Postgres version?
> >
> > 2) What OS(s)? I am assuming Windows from the log info below, but we all
> know what assuming gets you.
> >
> > 3) Where were you backing up from and to?
> >
> > 4) Which cluster does not start, the master or the child you created
> with pg_basebackup?
> >
> >
> >>
> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données a été
> >> interrompu ; dernier lancement connu à 2015-02-06 07:05:05 EST
> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données n'a pas été
> >> arrêté proprement ; restauration
> >> automatique en cours
> >> 2015-02-06 07:11:38 EST LOG: record with zero length at 24B/2C000160
> >> 2015-02-06 07:11:38 EST LOG: la ré-exécution n'est pas nécessaire
> >> 2015-02-06 07:11:38 EST LOG: le système de bases de données est prêt
> >> pour accepter les connexions
> >> 2015-02-06 07:11:38 EST LOG: lancement du processus autovacuum
> >> 2015-02-06 07:11:38 EST FATAL: le rôle « 208375PT$ » n'existe pas
> >
> >
> > So where is role 208375PT$ supposed to come from?
> >
> >
> >>
> >> Then if I start pgAdmin I get a series of pop-ups I have to click OK to
> >> to continue:
> >>
> >> An error has ocurred: Column not found in pgSet: "datlastsysoid"
> >> An error has ocurred: Column not found in pgSet: datlastsysoid
> >> An error has ocurred: Column not found in pgSet: oid
> >> An error has ocurred: Column not found in pgSet: encoding
> >> An error has ocurred: Column not found in pgSet: Connection to database
> >> broken
> >
> >
> > Not sure about that this, someone more versed in pgAdmin will have to
> answer.
> >
>
> Usually you see these messages when you're using a pgadmin major release
> older than a PostgreSQL make release. For a 9.3 release, that would mean a
> pgadmin older than 1.18.
>

I'm running pgadmin 1.18.1

> >
> >>
> >> And after that, I went back to the log file and there's new information
> >> added:
> >>
> >> 2015-02-06 07:51:05 EST LOG: processus serveur (PID 184) a été arrêté
> >> par l'exception 0x80000004
> >> 2015-02-06 07:51:05 EST DÉTAIL: Le processus qui a échoué exécutait :
> >> SELECT version();
> >> 2015-02-06 07:51:05 EST ASTUCE : Voir le fichier d'en-tête C «
> >> ntstatus.h » pour une description de la valeur
> >> hexadécimale.
> >
> >
> > Well according to here:
> >
> > https://msdn.microsoft.com/en-us/library/cc704588.aspx
> >
> > 0x80000004
> > STATUS_SINGLE_STEP
> >
> >
> > {EXCEPTION} Single Step A single step or trace operation has just been
> completed.
> >
> > A developer is going to have explain what that means.
> >
> >
> >
> >> 2015-02-06 07:51:05 EST LOG: arrêt des autres processus serveur actifs
> >> 2015-02-06 07:51:05 EST ATTENTION: arrêt de la connexion à cause de
> >> l'arrêt brutal d'un autre processus serveur
> >> 2015-02-06 07:51:05 EST DÉTAIL: Le postmaster a commandé à ce processus
> >> serveur d'annuler la transaction
> >> courante et de quitter car un autre processus serveur a quitté
> >> anormalement
> >> et qu'il existe probablement de la mémoire partagée corrompue.
> >> 2015-02-06 07:51:05 EST ASTUCE : Dans un moment, vous devriez être
> >> capable de vous reconnecter à la base de
> >> données et de relancer votre commande.
> >> 2015-02-06 07:51:05 EST LOG: processus d'archivage (PID 692) quitte
> >> avec le code de sortie 1
> >> 2015-02-06 07:51:05 EST LOG: tous les processus serveur se sont
> >> arrêtés, réinitialisation
> >> 2015-02-06 07:51:15 EST FATAL: le bloc de mémoire partagé pré-existant
> >> est toujours en cours d'utilisation
> >> 2015-02-06 07:51:15 EST ASTUCE : Vérifier s'il n'y a pas de vieux
> >> processus serveur en cours d'exécution. Si c'est le
> >> cas, fermez-les.
> >>
> >> I was about to try restarting postgresql using the base backup I made
> >> yesterday but since this means I'll have to copy my database again (700
> >> GB takes a while...) I am looking for a better solution from more
> >> experienced people.
> >
> >
> >
> > My suspicion is you copied at least partly over a running server.
> >
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Melvin Davidson 2015-02-09 17:01:30 Re: Partioning with overlapping and non overlapping constraints
Previous Message Vladislav Tchernev 2015-02-09 16:26:50 Mult-standby streaming replication master failover