BUG #14356: "FATAL: the database system is starting up" error occurs to queries after PostgreSQL server start

From: xrensis(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14356: "FATAL: the database system is starting up" error occurs to queries after PostgreSQL server start
Date: 2016-10-05 14:25:45
Message-ID: 20161005142545.27238.26442@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14356
Logged by: Márton Kamrás
Email address: xrensis(at)gmail(dot)com
PostgreSQL version: 9.3.14
Operating system: Ubuntu 14.04.5 LTS, Trusty Tahr
Description:

After the startup of the PostgreSQL server there is a state when it accepts
incoming connections but generates "FATAL: the database system is starting
up" as results for run queries.

I have written a postgresql_9.3.14_test.sh script that installs the required
environment and reproduces the bug.

postgresql_9.3.14_test.sh and its required files can be downloaded from:
https://github.com/1forintos/postgresql_jdbc_test/tree/master/postgres_bug_report

The script was tested on:
- Host OS: Ubuntu 16.04.1 LTS (Xenial Xerus) 64bit
- Docker version at the time of testing: 1.12.1, build 23cf638.
- lshw output:
https://github.com/1forintos/postgresql_jdbc_test/blob/master/postgres_bug_report/lshw_output.txt

Information on the test environment it creates:
- container OS: "Ubuntu 14.04.5 LTS, Trusty Tahr"
- kernel: Linux 4.4.0-38-generic
- postgresql server version (psql -c "select version();"): PostgreSQL
9.3.14 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.4-2ubuntu1~14.04.3) 4.8.4, 64-bit
- psql client version (psql --version): psql (PostgreSQL) 9.5.4
- PostgreSQL configuration file used in the container:
https://github.com/1forintos/postgresql_jdbc_test/blob/master/postgres_bug_report/postgresql.conf

Required files in the same folder as the run script:
- install_docker.sh
- Dockerfile
- postgresql.conf
- generator_script.py
- test_db.sh
- jars/postgres.dbclient.jar
- jars/postgresql-jdbc4-9.2.jar
- jars/commons-csv-1.4.jar

Overview of this script:
- Checks docker and java installation and offers their
update/installtion.
- Creates and runs docker container for the postgresql database server
using the required docker image file.
- Once the container and the database is set up it inserts dummy data
into the database.
- Then it creates a loop:
- stops the database
- starts the database and runs queries using a JDBC application at
the same time
- At the end it copies database logs into the logs/pg_log folder
- The JDBC application creates logs

Note:
- logging of postgresql is set to level "debug5" which (in this current
setup) generates around 1GB of logs within the container.
- this can be lowered by modifying the postgresql.conf file in the same
folder as the script is then running postgresql_9.3.14_test.sh
- postgresql_9.3.14_test.sh most likely requires superuser privileges to
work
- total run of the script may take up to 30 minutes because of the low
hardware resources the docker container gets (so as to reproduce best the
bug that occurs on "low resource" hardwares)
- this time can be reduced redefining "cpuQuota" in the script by
modifying the following line:
- "let cpuQuota=50000000/$maxClockSpeed" to "let cpuQuota=Y"
where Y is Y=X*1000 where X is the percentage of max CPU speed we want to
give the docker container
- the higher this percentage is the less likely it will be to
reproduce the bug

Details of this script:
1. installs/updates Docker containerization platform, Java 7 (after user
confirmation)
2. stops ALL previously started containers in case they exists
3. removes generated content of previous runs of the script
4. builds a Docker image using this Dockerfile [github link] with
"ubuntu_14.04/pg_9.3.14:latest" tag
5. (using lscpu) calculates the CPU quota to be granted to the Docker
container to achieve ~400MHz clock speed
6. creates and starts a container with the name "postgres_9.3.14" that
has 1 CPU core and the calculated CPU quota
7. starts the PostgreSQL server in the container
8. rune the "generator_script.py" script which:
8.1. generates the SQL files for inserting dummy data into the
database ([5 tables] X [id + 10 columns (varchar(100))] X [180k rows],
(example: [github link]))
8.2. generates CSV files for the JDBC application (see 9.2.b.2)
8.3. executes all SQL scripts one-by-one which were generated in
8.1, using the following command: psql -h <container_IP> -U postgres -f
<sql_script_path> -1
9. runs the test_db.sh script for 300 seconds which:
9.1. stops the PostgreSQL server in the container
9.2. a) and b) are executed paralleled:
a) starts the PostgreSQL server in the container
b) runs the JDBC application which:
9.2.b.1 runs "SELECT count(*) AS row_count FROM tableN"
N=1..5 until all 5 tables return a 0> row_count
9.2.b.2 checks and compares one random row from all tables
in the database to the corresponding values contained by the generated CSV
files (in 8.2) to verify their correctness
9.3. checks the exit code of the JDBC application, if the exit code
equals 1 then the data compared in 9.2.b.2 do not match
9.4. GOTO 9.1.
10. copies the generated logs by the PostgreSQL server from the
container to the <directory where the script is run>/logs/pg_log folder
11. restarts the PostgreSQL server in the container

After running this script there are 2 folders in the logs folder
- jdbc: containing generated logs of the JDBC application
- pg_log: containing logs copied from the PostgreSQL server

Among the JDBC logs there is a high chance of seeing "FATAL: the
database system is starting up" as an exception during the runs of
queries.

The JDBC application:
- project can be found via this link:
https://github.com/1forintos/postgresql_jdbc_test/tree/master/java/postgres.dbclient
- project was created with Eclipse Luna 32bit
- build with Apache Maven 3.3.9 (command in project root: mvn clean
install)
- requires Java 1.7

2 connecting questions:
- Is there a known interface to check if the database is up and fully
functional and ready for queries?
- Is it possible theoretically that the result of a query is incorrect
and does not match the actual data in the database?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Rader, David 2016-10-05 14:34:56 Re: Wrong results of function age(timestamp, timestamp)
Previous Message Vitaly Burovoy 2016-10-05 14:08:55 Re: BUG #14354: Wrong interpretation of JSON 'null'