From 7cfea69d3f5df4c15681f4902a86b366f0ed4292 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun, 28 Jul 2024 16:40:35 -0400
Subject: [PATCH v1 3/3] Count individual SQL commands in pg_restore's
 --transaction-size mode.

The initial implementation counted one action per TOC entry (except
for some special cases for multi-blob BLOBS entries).  This assumes
that TOC entries are all about equally complex, but it turns out
that that assumption doesn't hold up very well in binary-upgrade
mode.  For example, even after the previous patch I was able to
cause backend bloat with tables having many inherited constraints.

To fix, count multi-command TOC entries as N actions, allowing the
transaction size to be scaled down when we hit a complex TOC entry.
Rather than add a SQL parser to pg_restore, approximate "multi
command" by counting semicolons in the TOC entry's defn string.
This will be fooled by semicolons appearing in string literals ---
but the error is in the conservative direction, so it doesn't seem
worth working harder.  The biggest risk is with function/procedure
TOC entries, but we can just explicitly skip those.

(This is undoubtedly a hack, and maybe someday we'll be able to
revert it after fixing the backend's bloat issues or rethinking
what pg_dump emits in binary upgrade mode.  But that surely isn't
a project for v17.)

Thanks to Alexander Korotkov for the let's-count-semicolons idea.

Per report from Justin Pryzby.  Back-patch to v17 where txn_size mode
was introduced.

Discussion: https://postgr.es/m/ZqEND4ZcTDBmcv31@pryzbyj2023
---
 src/bin/pg_dump/pg_backup_archiver.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 68e321212d..8c20c263c4 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -3827,10 +3827,32 @@ _printTocEntry(ArchiveHandle *AH, TocEntry *te, bool isData)
 	{
 		IssueACLPerBlob(AH, te);
 	}
-	else
+	else if (te->defn && strlen(te->defn) > 0)
 	{
-		if (te->defn && strlen(te->defn) > 0)
-			ahprintf(AH, "%s\n\n", te->defn);
+		ahprintf(AH, "%s\n\n", te->defn);
+
+		/*
+		 * If the defn string contains multiple SQL commands, txn_size mode
+		 * should count it as N actions not one.  But rather than build a full
+		 * SQL parser, approximate this by counting semicolons.  One case
+		 * where that tends to be badly fooled is function definitions, so
+		 * ignore them.  (restore_toc_entry will count one action anyway.)
+		 */
+		if (ropt->txn_size > 0 &&
+			strcmp(te->desc, "FUNCTION") != 0 &&
+			strcmp(te->desc, "PROCEDURE") != 0)
+		{
+			const char *p = te->defn;
+			int			nsemis = 0;
+
+			while ((p = strchr(p, ';')) != NULL)
+			{
+				nsemis++;
+				p++;
+			}
+			if (nsemis > 1)
+				AH->txnCount += nsemis - 1;
+		}
 	}
 
 	/*
-- 
2.43.5

