[SERVER-48516] at startup, confirm replica set node with auth can connect to itself Created: 01/Jun/20  Updated: 29/Oct/23  Resolved: 13/Nov/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0-rc7
Fix Version/s: 4.9.0, 4.4.4

Type: Bug Priority: Major - P3
Reporter: Mark Callaghan (Inactive) Assignee: Mark Benvenuto
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File ixbug.tar    
Issue Links:
Backports
Duplicate
duplicates SERVER-50665 Index creation stuck with "Index Buil... Closed
Related
is related to SERVER-48235 The primary node should use the Async... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-06-15, Execution Team 2020-06-29, Security 2020-11-16
Participants:
Case:

 Description   

I have a workload that uses a single-node replicaset and does a create index. It looks like I now must start the single-node with the --keyFile option for create index to work – see SERVER-48344. So my questions are:

  • The create index statement hung for days in my test before I noticed it.
  • If create index requires this to succeed then the create index command should fail immediately. It doesn't fail fast in rc7.
  • Have the docs been updated for this new requirement?
  • Is this change OK? I get that implementation artifacts might require this, but a single-node replica shouldn't need a keyfile. Users will be slightly unhappy that their workflow has to be changed for this.

This is new in 4.4.0-rc7 and I assume is from the fix for SERVER-48235.

I will attach files, but below some info has been inlined

From mongo.log

{"t":{"$date":"2020-05-29T16:46:09.004-07:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"progress meter","attr":{"name":"Index Build: scanning collection","done":12592300,"total":13026808,"percent":96}}
{"t":{"$date":"2020-05-29T16:46:10.028-07:00"},"s":"I",  "c":"INDEX",    "id":20391,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: collection scan done","attr":{"totalRecords":13026808,"durationMillis":42000}}
{"t":{"$date":"2020-05-29T16:46:13.004-07:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"progress meter","attr":{"name":"Index Build: inserting keys from external sorter into index","done":3528500,"total":13026808,"percent":27}}
{"t":{"$date":"2020-05-29T16:46:16.004-07:00"},"s":"I",  "c":"-",        "id":51773,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"progress meter","attr":{"name":"Index Build: inserting keys from external sorter into index","done":8745400,"total":13026808,"percent":67}}
{"t":{"$date":"2020-05-29T16:46:18.436-07:00"},"s":"I",  "c":"INDEX",    "id":20685,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"index build: inserted {bulk_getKeysInserted} keys from external sorter into index in {timer_seconds} seconds","attr":{"bulk_getKeysInserted":13026808,"timer_seconds":8}}
{"t":{"$date":"2020-05-29T16:46:18.595-07:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"localhost:27017"}}
{"t":{"$date":"2020-05-29T16:46:18.596-07:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"connection accepted","attr":{"remote":"127.0.0.1:46208","sessionId":23,"connectionCount":2}}
{"t":{"$date":"2020-05-29T16:46:18.596-07:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn23","msg":"client metadata","attr":{"remote":"127.0.0.1:46208","client":"conn23","doc":{"driver":{"name":"NetworkInterfaceTL","version":"4.4.0-rc7"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"18.04"}}}}
{"t":{"$date":"2020-05-29T16:46:18.597-07:00"},"s":"I",  "c":"STORAGE",  "id":3856202, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"'voteCommitIndexBuild' command failed.","attr":{"indexBuildUUID":{"uuid":{"$uuid":"92295fe8-4718-4bf6-92d1-9cbcb0ab91ed"}},"responseStatus":{"operationTime":{"$timestamp":{"t":1590795976,"i":1}},"ok":0.0,"errmsg":"command voteCommitIndexBuild requires authentication","code":13,"codeName":"Unauthorized","$clusterTime":{"clusterTime":{"$timestamp":{"t":1590795976,"i":1}},"signature":{"hash":{"$binary":{"base64":"uoSbFqScxXJd3ZBioUf0Nbn1+YE=","subType":"0"}},"keyId":6832415531887230980}}}}}

From currentOp()

                {
                        "type" : "op",
                        "host" : "nuc82:27017",
                        "desc" : "conn21",
                        "connectionId" : 21,
                        "client" : "127.0.0.1:46206",
                        "appName" : "MongoDB Shell",
                        "clientMetadata" : {
                                "application" : {
                                        "name" : "MongoDB Shell"
                                },
                                "driver" : {
                                        "name" : "MongoDB Internal Client",
                                        "version" : "4.4.0-rc7"
                                },
                                "os" : {
                                        "type" : "Linux",
                                        "name" : "Ubuntu",
                                        "architecture" : "x86_64",
                                        "version" : "18.04"
                                }
                        },
                        "active" : true,
                        "currentOpTime" : "2020-05-31T08:45:31.023-07:00",
                        "effectiveUsers" : [
                                {
                                        "user" : "root",
                                        "db" : "admin"
                                }
                        ],
                        "opid" : 22475,
                        "lsid" : {
                                "id" : UUID("395d379b-d8ba-43c0-826e-704005748de5"),
                                "uid" : BinData(0,"Y5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fg=")
                        },
                        "secs_running" : NumberLong(144003),
                        "microsecs_running" : NumberLong("144003512238"),
                        "op" : "command",
                        "ns" : "linkdb0.linktable",
                        "command" : {
                                "createIndexes" : "linktable",
                                "indexes" : [
                                        {
                                                "key" : {
                                                        "id1" : 1,
                                                        "link_type" : 1,
                                                        "visibility" : 1,
                                                        "time" : 1,
                                                        "id2" : 1,
                                                        "version" : 1,
                                                        "data" : 1
                                                },
                                                "name" : "id1_1_link_type_1_visibility_1_time_1_id2_1_version_1_data_1"
                                        }
                                ],
                                "lsid" : {
                                        "id" : UUID("395d379b-d8ba-43c0-826e-704005748de5")
                                },
                                "$clusterTime" : {
                                        "clusterTime" : Timestamp(1590795926, 13072),
                                        "signature" : {
                                                "hash" : BinData(0,"NYvnaGW5IYSRa7wm2vn0cXAByWQ="),
                                                "keyId" : NumberLong("6832415531887230980")
                                        }
                                },
                                "$db" : "linkdb0"
                        },
                        "numYields" : 0,
                        "waitingForLatch" : {
                                "timestamp" : ISODate("2020-05-29T23:45:27.659Z"),
                                "captureName" : "FutureResolution"
                        },
                        "locks" : {
 
                        },
                        "waitingForLock" : false,
                        "lockStats" : {
                                "ParallelBatchWriterMode" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(3)
                                        }
                                },
                                "ReplicationStateTransition" : {
                                        "acquireCount" : {
                                                "w" : NumberLong(5)
                                        }
                                },
                                "Global" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(1),
                                                "w" : NumberLong(4)
                                        }
                                },
                                "Database" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(1),
                                                "w" : NumberLong(3)
                                        }
                                },
                                "Collection" : {
                                        "acquireCount" : {
                                                "w" : NumberLong(1),
                                                "W" : NumberLong(1)
                                        }
                                },
                                "Mutex" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(4)
                                        }
                                }
                        },
                        "waitingForFlowControl" : false,
                        "flowControlStats" : {
                                "acquireCount" : NumberLong(3)
                        }
                },

The create index thread stack

      1 __GI___nanosleep,mongo::sleepmicros(long,mongo::IndexBuildsCoordinatorMongod::_signalPrimaryForCommitReadiness(mongo::OperationContext*,,mongo::IndexBuildsCoordinator::_buildIndex(mongo::OperationContext*,,mongo::IndexBuildsCoordinator::_runIndexBuildInner(mongo::OperationContext*,,mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*,,??,??,mongo::ThreadPool::_doOneTask(std::unique_lock<mongo::latch_detail::Latch>*),mongo::ThreadPool::_consumeTasks(),mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*,,??,??,start_thread,clone



 Comments   
Comment by Githook User [ 11/Jan/21 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-48516 Verify at startup replica set with auth has a keyFile

(cherry picked from commit c38ec5727899cd563791d5ea4ec054cf6322498c)
Branch: v4.4
https://github.com/mongodb/mongo/commit/c4e02cfb947578316e661b97ef2342b81629d1f0

Comment by Louis Williams [ 04/Jan/21 ]

michael.gorodilov@gmail.com, can you confirm whether the workaround in my previous comment solves your problem? I have requested a backport to 4.4 so that this problem may be fixed in the next release.

Comment by Mchl Grdlv [ 04/Jan/21 ]

I have installed in synology docker Rocket.chat and mongo. Everything works well.
But the container generates messages every second.
This puts a strain on the disk system.
I ask for advice and correct this error. As a last resort, how can I disable writing these messages to disk?

"2021-01-04 08:49:57,stdout,"{\"t\":

{\"$date\":\"2021-01-04T11:49:57.752+03:00\"}

,\"s\":\"I\", \"c\":\"STORAGE\", \"id\":3856202, \"ctx\":\"IndexBuildsCoordinatorMongod-2\",\"msg\":\"'voteCommitIndexBuild' command failed.\",\"attr\":{\"indexBuildUUID\":{\"uuid\":{\"$uuid\":\"d90def3b-4dc1-4cce-aa19-53c3f7eb094e\"}},\"responseStatus\":{\"operationTime\":{\"$timestamp\":{\"t\":1609750196,\"i\":1}},\"ok\":0.0,\"errmsg\":\"command voteCommitIndexBuild requires authentication\",\"code\":13,\"codeName\":\"Unauthorized\",\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1609750196,\"i\":1}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"RtULaapbdm/5ooRgGl6jfYeyu4k=\",\"subType\":\"0\"}},\"keyId\":6890507829308817412}}}}}
"2021-01-04 08:49:57,stdout,"{\"t\":

{\"$date\":\"2021-01-04T11:49:57.649+03:00\"}

,\"s\":\"I\", \"c\":\"STORAGE\", \"id\":3856202, \"ctx\":\"IndexBuildsCoordinatorMongod-1\",\"msg\":\"'voteCommitIndexBuild' command failed.\",\"attr\":{\"indexBuildUUID\":{\"uuid\":{\"$uuid\":\"5038dec8-d17f-49a7-afe8-896d37876686\"}},\"responseStatus\":{\"operationTime\":{\"$timestamp\":{\"t\":1609750196,\"i\":1}},\"ok\":0.0,\"errmsg\":\"command voteCommitIndexBuild requires authentication\",\"code\":13,\"codeName\":\"Unauthorized\",\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1609750196,\"i\":1}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"RtULaapbdm/5ooRgGl6jfYeyu4k=\",\"subType\":\"0\"}},\"keyId\":6890507829308817412}}}}}
"2021-01-04 08:49:57,stdout,"{\"t\":

{\"$date\":\"2021-01-04T11:49:57.648+03:00\"}

,\"s\":\"I\", \"c\":\"STORAGE\", \"id\":3856202, \"ctx\":\"IndexBuildsCoordinatorMongod-0\",\"msg\":\"'voteCommitIndexBuild' command failed.\",\"attr\":{\"indexBuildUUID\":{\"uuid\":{\"$uuid\":\"7b61f52d-9904-4dc8-9b06-aedb8fcadfad\"}},\"responseStatus\":{\"operationTime\":{\"$timestamp\":{\"t\":1609750196,\"i\":1}},\"ok\":0.0,\"errmsg\":\"command voteCommitIndexBuild requires authentication\",\"code\":13,\"codeName\":\"Unauthorized\",\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1609750196,\"i\":1}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"RtULaapbdm/5ooRgGl6jfYeyu4k=\",\"subType\":\"0\"}},\"keyId\":6890507829308817412}}}}}
"2021-01-04 08:49:56,stdout,"{\"t\":

{\"$date\":\"2021-01-04T11:49:56.751+03:00\"}

,\"s\":\"I\", \"c\":\"STORAGE\", \"id\":3856202, \"ctx\":\"IndexBuildsCoordinatorMongod-2\",\"msg\":\"'voteCommitIndexBuild' command failed.\",\"attr\":{\"indexBuildUUID\":{\"uuid\":{\"$uuid\":\"d90def3b-4dc1-4cce-aa19-53c3f7eb094e\"}},\"responseStatus\":{\"operationTime\":{\"$timestamp\":{\"t\":1609750186,\"i\":2}},\"ok\":0.0,\"errmsg\":\"command voteCommitIndexBuild requires authentication\",\"code\":13,\"codeName\":\"Unauthorized\",\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1609750186,\"i\":2}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"2TFFt43KcKYAPhR+rXwog/vqyOI=\",\"subType\":\"0\"}},\"keyId\":6890507829308817412}}}}}
"2021-01-04 08:49:56,stdout,"{\"t\":

{\"$date\":\"2021-01-04T11:49:56.647+03:00\"}

,\"s\":\"I\", \"c\":\"STORAGE\", \"id\":3856202, \"ctx\":\"IndexBuildsCoordinatorMongod-1\",\"msg\":\"'voteCommitIndexBuild' command failed.\",\"attr\":{\"indexBuildUUID\":{\"uuid\":{\"$uuid\":\"5038dec8-d17f-49a7-afe8-896d37876686\"}},\"responseStatus\":{\"operationTime\":{\"$timestamp\":{\"t\":1609750186,\"i\":2}},\"ok\":0.0,\"errmsg\":\"command voteCommitIndexBuild requires authentication\",\"code\":13,\"codeName\":\"Unauthorized\",\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1609750186,\"i\":2}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"2TFFt43KcKYAPhR+rXwog/vqyOI=\",\"subType\":\"0\"}},\"keyId\":6890507829308817412}}}}}
"2021-01-04 08:49:56,stdout,"{\"t\":

{\"$date\":\"2021-01-04T11:49:56.645+03:00\"}

,\"s\":\"I\", \"c\":\"STORAGE\", \"id\":3856202, \"ctx\":\"IndexBuildsCoordinatorMongod-0\",\"msg\":\"'voteCommitIndexBuild' command failed.\",\"attr\":{\"indexBuildUUID\":{\"uuid\":{\"$uuid\":\"7b61f52d-9904-4dc8-9b06-aedb8fcadfad\"}},\"responseStatus\":{\"operationTime\":{\"$timestamp\":{\"t\":1609750186,\"i\":2}},\"ok\":0.0,\"errmsg\":\"command voteCommitIndexBuild requires authentication\",\"code\":13,\"codeName\":\"Unauthorized\",\"$clusterTime\":{\"clusterTime\":{\"$timestamp\":{\"t\":1609750186,\"i\":2}},\"signature\":{\"hash\":{\"$binary\":{\"base64\":\"2TFFt43KcKYAPhR+rXwog/vqyOI=\",\"subType\":\"0\"}},\"keyId\":6890507829308817412}}}}}

Comment by Githook User [ 13/Nov/20 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-48516 Verify at startup replica set with auth has a keyFile
Branch: master
https://github.com/mongodb/mongo/commit/c38ec5727899cd563791d5ea4ec054cf6322498c

Comment by Louis Williams [ 27/Oct/20 ]

Hi earthquake_90@mail.ru, if you are running into this issue, you will need to follow the documentation here that explains how to set up keyfile authentication within a replica set.

The problem is that the mongod is unable to authenticate with itself. For a quick workaround, pass a --keyFile to the mongod, just like in Mark's previous comment:

openssl rand -base64 756 > mongo.keyfile
chmod 400 mongo.keyfile
path/to/mongod --keyFile mongo.keyfile

Or in your mongod.conf:

security:
  keyFile: <path-to-keyfile>

Comment by Ivan Strelnikov [ 27/Oct/20 ]

Hi! any updates?

Comment by Eric Milkie [ 09/Jul/20 ]

I changed the description to match the suggested work to solve this ticket. Alternatively, we can modify AsyncDBClient to know when it is talking to itself and to switch to a different internal DBClient that doesn't use the network.

Comment by Eric Milkie [ 09/Jul/20 ]

Technical details: Using a replica set with auth enabled but without a way for nodes to authenticate to one another is not a supported configuration, but there is currently nothing in the code that warns you about this situation if you only have one node up in your replica set. Note that other operations besides createIndexes would identify this problem; for example, adding a non-voting node to a one-node replica set would seem to succeed, and yet nothing would replicate to the new node. There is no active indication anything is wrong in this situation other than error messages appearing in the system log (and the lack of new data on the newly added node). This problem is not new to 4.4 and has existed since probably 2.2 and prior. The change in behavior for createIndexes, however, is new in 4.4.

To make this problem more noticeable for createIndexes, we cannot fix the problem when the node goes to voteCommitIndexBuild, as at that point it is too late to abort the index build. Instead, what we can do is add a check on startup that a replicaset node can talk to itself, and abort initialization if this check fails, with an explanatory message. This will prevent surprise behaviors like this in the future, since there exists a wide variety of failure cases when running a single-voting-node replica set with an incorrect authentication configuration, and it would be difficult to document all of them and keep such a list up to date for each release.

Comment by Mark Callaghan (Inactive) [ 01/Jun/20 ]

ixbug.tar has PMP stack traces, mongo.log, db.currentOp() output and the output from "ps" for mongod

I use this mongo.conf

processManagement:
  fork: true
systemLog:
  destination: file
  path: /home/mdcallag/d/mo44/mongo.log
  logAppend: true
storage:
  syncPeriodSecs: 60
  dbPath: /data/m/mo
  journal:
    enabled: true
operationProfiling:
  slowOpThresholdMs: 2000
replication:
  oplogSizeMB: 32000
  replSetName: foo
security:
  authorization: enabled
 
storage.wiredTiger.collectionConfig.blockCompressor: none
storage.wiredTiger.engineConfig.journalCompressor: snappy
storage.wiredTiger.engineConfig.cacheSizeGB: 12

Previously I started mongod via bin/mongod --config mongo.conf

Now I use

rm -f mongo.keyfile
openssl rand -base64 756 > mongo.keyfile
chmod 400 mongo.keyfile
 
bin/mongod --config mongo.conf --keyFile mongo.keyfile

Generated at Thu Feb 08 05:17:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.