Обсуждение: BUG #15734: Walsender process crashing when executing SHOW ALL;
The following bug has been logged on the website: Bug reference: 15734 Logged by: Alexander Kukushkin Email address: cyberdemn@gmail.com PostgreSQL version: 11.2 Operating system: Ubuntu 18.04.2 LTS Description: According to the documentation [1], replication protocol supports SHOW command, which is similar to the SQL command SHOW [2], so I tried to execute "SHOW ALL;" and walsender crashed with the following stack-trace: Program received signal SIGSEGV, Segmentation fault. MemoryContextAlloc (context=0x0, size=104) at ./build/../src/backend/utils/mmgr/mcxt.c:783 783 ./build/../src/backend/utils/mmgr/mcxt.c: No such file or directory. (gdb) bt #0 MemoryContextAlloc (context=0x0, size=104) at ./build/../src/backend/utils/mmgr/mcxt.c:783 #1 0x0000558b292beb97 in CopySnapshot (snapshot=0x558b29715060 <CatalogSnapshotData>) at ./build/../src/backend/utils/time/snapmgr.c:674 #2 0x0000558b292bf855 in RegisterSnapshotOnOwner (snapshot=0x558b29715060 <CatalogSnapshotData>, owner=0x558b2ae178c8) at ./build/../src/backend/utils/time/snapmgr.c:884 #3 0x0000558b292bf896 in RegisterSnapshot (snapshot=<optimized out>) at ./build/../src/backend/utils/time/snapmgr.c:868 #4 0x0000558b28ece60f in systable_beginscan (heapRelation=heapRelation@entry=0x7f96bedb2cc8, indexId=<optimized out>, indexOK=<optimized out>, snapshot=snapshot@entry=0x0, nkeys=nkeys@entry=1, key=key@entry=0x7ffe14269ec0) at ./build/../src/backend/access/index/genam.c:356 #5 0x0000558b2926bfd3 in SearchCatCacheList (cache=0x558b2ae2f700, nkeys=nkeys@entry=1, v1=16384, v2=v2@entry=0, v3=v3@entry=0) at ./build/../src/backend/utils/cache/catcache.c:1650 #6 0x0000558b2927e2ee in SearchSysCacheList (cacheId=cacheId@entry=8, nkeys=nkeys@entry=1, key1=<optimized out>, key2=key2@entry=0, key3=key3@entry=0) at ./build/../src/backend/utils/cache/syscache.c:1427 #7 0x0000558b2917ef50 in roles_is_member_of (roleid=roleid@entry=16384) at ./build/../src/backend/utils/adt/acl.c:4862 #8 0x0000558b29182717 in is_member_of_role (member=16384, role=role@entry=3374) at ./build/../src/backend/utils/adt/acl.c:4946 #9 0x0000558b2929fe0a in ShowAllGUCConfig (dest=<optimized out>) at ./build/../src/backend/utils/misc/guc.c:8280 #10 GetPGVariable (name=<optimized out>, dest=<optimized out>) at ./build/../src/backend/utils/misc/guc.c:8184 #11 0x0000558b2911b7aa in exec_replication_command (cmd_string=cmd_string@entry=0x558b2ade36b0 "SHOW ALL;") at ./build/../src/backend/replication/walsender.c:1555 #12 0x0000558b291696a6 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x558b2ae125d8, dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4178 #13 0x0000558b290f4a7d in BackendRun (port=0x558b2ae0c690) at ./build/../src/backend/postmaster/postmaster.c:4361 #14 BackendStartup (port=0x558b2ae0c690) at ./build/../src/backend/postmaster/postmaster.c:4033 #15 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1706 #16 0x0000558b290f5abf in PostmasterMain (argc=17, argv=0x558b2adde020) at ./build/../src/backend/postmaster/postmaster.c:1379 #17 0x0000558b28e824c2 in main (argc=17, argv=0x558b2adde020) at ./build/../src/backend/main/main.c:228 I don't think that such a crash is expected behavior. Either it should return the table with all settings or report simply error out with message 'unrecognized configuration parameter "ALL"'. [1] https://www.postgresql.org/docs/11/protocol-replication.html [2] https://www.postgresql.org/docs/11/sql-show.html
On Thu, Apr 04, 2019 at 07:54:19AM +0000, PG Bug reporting form wrote: > I don't think that such a crash is expected behavior. Either it should > return the table with all settings or report simply error out with message > 'unrecognized configuration parameter "ALL"'. No, it's not correct to just disallow ALL as we have a similar check in GetConfigOptionByName() for individual parameters. Problem is origically from 25fff40, and got worse as of 0c8910a0. A superuser is able to get the list of parameters, but I can easily see the failure when using a plain replication user. A replication user is one which can take a full base backup, so he can easily retrieve the full list of parameters anyway. What about bypassing the problem and just allow WAL sender contexts to always have access to parameter values? I don't think that this is much a security issue if one thinks about the access to BASE_BACKUP. -- Michael
Вложения
On Mon, Apr 08, 2019 at 04:39:03PM +0900, Michael Paquier wrote: > Problem is origically from 25fff40, and got worse as of 0c8910a0. A > superuser is able to get the list of parameters, but I can easily see > the failure when using a plain replication user. A replication user > is one which can take a full base backup, so he can easily retrieve > the full list of parameters anyway. What about bypassing the problem > and just allow WAL sender contexts to always have access to parameter > values? I don't think that this is much a security issue if one > thinks about the access to BASE_BACKUP. After pondering more about this one, allowing replication to have the same rights as a superuser in this case does not feel completely right either as this is just a shortcut to bypass the syscache lookups happening through is_member_of_role(). So attached is a much better and simple idea: let's just use a transaction context when issuing the SHOW command so as it is possible to perform cache lookups correctly. This way, even a replication role is not able to see some parameters except if the role is a member of pg_read_all_settings, which is more consistent. This needs a backpatch down to v10. Any objections to that? -- Michael
Вложения
On 2019-Apr-10, Michael Paquier wrote: > After pondering more about this one, allowing replication to have the > same rights as a superuser in this case does not feel completely right > either as this is just a shortcut to bypass the syscache lookups > happening through is_member_of_role(). So attached is a much better > and simple idea: let's just use a transaction context when issuing the > SHOW command so as it is possible to perform cache lookups correctly. > This way, even a replication role is not able to see some parameters > except if the role is a member of pg_read_all_settings, which is more > consistent. > > This needs a backpatch down to v10. Thanks for tracking this down. I think we should have a few tests issuing SHOW ALL in a replication connection with various levels of privilege; it's annoying that this bug took two years to find. With that, special-purpose buildfarm members would tell us if we've made some mistake in transaction handling or whatever. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 10, 2019 at 11:01:21AM -0400, Alvaro Herrera wrote: > I think we should have a few tests issuing SHOW ALL in a replication > connection with various levels of privilege; it's annoying that this bug > took two years to find. With that, special-purpose buildfarm members > would tell us if we've made some mistake in transaction handling or > whatever. What do you think about the set attached? I prefer that we use psql(on_error_die => 1) so as in the case of a crash or an error the test dies hard and fast. In consequence, I have granted pg_read_all_settings to the replication user to ensure that no errors happen, still the syscache lookup is done. We could have negative tests but I don't think that's worth the addition as we already have tests for pg_read_all_settings in src/test/regress/. If I remove the patched patch from walsender.c, the test fails immediately. -- Michael
Вложения
On Thu, Apr 11, 2019 at 12:07:12PM +0900, Michael Paquier wrote: > What do you think about the set attached? I prefer that we use > psql(on_error_die => 1) so as in the case of a crash or an error the > test dies hard and fast. In consequence, I have granted > pg_read_all_settings to the replication user to ensure that no errors > happen, still the syscache lookup is done. We could have negative > tests but I don't think that's worth the addition as we already have > tests for pg_read_all_settings in src/test/regress/. If I remove the > patched patch from walsender.c, the test fails immediately. I have done another round of review of this patch and committed it down to v10. When testing on Windows, I have noticed that these new regression tests suffer the same problem with SSPI authentication as the recent tweaks I have done for pg_rewind. So I have backpatched the extension for PostgresNode::init done in d9f543e9 to be able to add extra options for the authentication configuration so as the new tests are able to work. Thanks Alexander for the report! -- Michael