[SERVER-24630] Mongos erroneously advances config optime for writes that fail write concern Created: 16/Jun/16  Updated: 25/Jan/17  Resolved: 21/Jun/16

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: 3.2.9, 3.3.9

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-27005 Write error revalidate logic needs to... Closed
related to SERVER-24574 Tag internal connections KeepOpen to ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl 16 (06/24/16)
Participants:

 Description   

Mongos advances config optime using the last visible optime from config servers. If the write commands fail due to timeout or NotMaster, the last visible optime is ahead of the last committed and may be rolled back.

With the following patch, majority read fails due to timeout, which should never happen.

diff --git a/jstests/sharding/auth_no_config_primary.js b/jstests/sharding/auth_no_config_primary.js
index 3bb1ea1..d2c4b79 100644
--- a/jstests/sharding/auth_no_config_primary.js
+++ b/jstests/sharding/auth_no_config_primary.js
@@ -8,6 +8,12 @@
 (function() {
     "use strict";
 
+    function enableFailPoint(node) {
+        assert.commandWorked(node.getDB('admin').runCommand(
+                                 {configureFailPoint: 'rsSyncApplyStop', mode: 'alwaysOn'}),
+                             'Failed to configure rsSyncApplyStop failpoint.');
+    }
+
     var st = new ShardingTest({shards: 1, keyFile: 'jstests/libs/key1'});
 
     st.s.getDB('admin').createUser({user: 'root', pwd: 'pass', roles: ['root']});
@@ -15,9 +21,9 @@
     var testDB = st.s.getDB('test');
     testDB.user.insert({hello: 'world'});
 
-    // Kill all secondaries, forcing the current primary to step down.
+    // Pause all secondaries.
     st.configRS.getSecondaries().forEach(function(secondaryConn) {
-        MongoRunner.stopMongod(secondaryConn.port);
+        enableFailPoint(secondaryConn);
     });
 
     // Test authenticate through a fresh connection.
diff --git a/src/mongo/s/sharding_egress_metadata_hook.cpp b/src/mongo/s/sharding_egress_metadata_hook.cpp
index 2040955..4565a52 100644
--- a/src/mongo/s/sharding_egress_metadata_hook.cpp
+++ b/src/mongo/s/sharding_egress_metadata_hook.cpp
@@ -43,6 +43,8 @@
 #include "mongo/s/sharding_egress_metadata_hook.h"
 #include "mongo/util/net/hostandport.h"
 
+#include "mongo/util/log.h"
+
 namespace mongo {
 
 namespace rpc {
@@ -114,6 +116,7 @@ Status ShardingEgressMetadataHook::_advanceConfigOptimeFromShard(ShardId shardId
 
                 const auto& replMetadata = parseStatus.getValue();
                 auto opTime = replMetadata.getLastOpVisible();
+                log() << "XXXSZ advance config optime " << metadataObj;
                 grid.advanceConfigOpTime(opTime);
             }
         } else {

// Advance config optime, which is never commited.
s20015| 2016-06-16T18:26:23.818-0400 I SHARDING [NetworkInterfaceASIO-ShardRegistry-0] XXXSZ advance config optime { $replData: { term: 1, lastOpCommitted: { ts: Timestamp 1466115787000|11, t: 1 }, lastOpVisible: { ts: Timestamp 1466115968000|1, t: 1 }, configVersion: 1, replicaSetId: ObjectId('576326bd769d59b46d7a042b'), primaryIndex: 0, syncSourceIndex: -1 }, $gleStats: { lastOpTime: { ts: Timestamp 1466115968000|1, t: 1 }, electionId: ObjectId('7fffffff0000000000000001') } }
s20015| 2016-06-16T18:27:38.614-0400 I SHARDING [Balancer] Operation timed out with status ExceededTimeLimit: Operation timed out, request was RemoteCommand 85 -- target:siyuan-mbp:20013 db:config expDate:2016-06-16T18:27:38.610-0400 cmd:{ find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, maxTimeMS: 30000 }
 
// Then a majority read fails.
s20015| 2016-06-16T18:27:38.862-0400 I SHARDING [conn1] Operation timed out with status ExceededTimeLimit: Operation timed out, request was RemoteCommand 104 -- target:siyuan-mbp:20012 db:config expDate:2016-06-16T18:27:38.857-0400 cmd:{ find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 }
	"errmsg" : "Operation timed out, request was RemoteCommand 104 -- target:siyuan-mbp:20012 db:config expDate:2016-06-16T18:27:38.857-0400 cmd:{ find: \"databases\", filter: { _id: \"test\" }, readConcern: { level: \"majority\", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 }",
c20012| 2016-06-16T18:27:38.869-0400 I COMMAND  [conn11] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 }
c20012| 2016-06-16T18:27:38.869-0400 I COMMAND  [conn11] command config.$cmd command: find { find: "databases", filter: { _id: "test" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1466115968000|1, t: 1 } }, limit: 1, maxTimeMS: 30000 } numYields:0 reslen:92 locks:{} protocol:op_command 30011ms

This is found during implementing SERVER-24574 which returns NotMaster rather than closing connections.



 Comments   
Comment by Githook User [ 29/Jul/16 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'visualzhou@gmail.com'}

Message: SERVER-24630 Mongos erroneously advances config optime for writes that fail write concern.

Use getLastOpCommitted rather than getLastOpVisible in sharding.
Branch: v3.2
https://github.com/mongodb/mongo/commit/350efef02455cae900d9d781c8dadb7911fee47d

Comment by Eric Milkie [ 22/Jun/16 ]

Yes, backport of this ticket should commence.

Comment by Siyuan Zhou [ 21/Jun/16 ]

milkie, this is a dependency of primary catch-up, do we want to backport this ticket and tagging internal connections?

Comment by Githook User [ 21/Jun/16 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-24630 Mongos erroneously advances config optime for writes that fail write concern.

Use getLastOpCommitted rather than getLastOpVisible in sharding.
Branch: master
https://github.com/mongodb/mongo/commit/640d07ddffbd4c67fddcac67dac2b9e7cb281ff2

Comment by Eric Milkie [ 17/Jun/16 ]

After discussion with kaloian.manassiev we think we should return a lastVisibleOpTime of 0 if the write concern fails. This should be true even when returning NotMaster as an error at the beginning of command processing (in addition to post-write write concern processing). This is option 1 as described above.

Using lastOpCommitted for this would be incorrect.

Comment by Eric Milkie [ 17/Jun/16 ]

I think we should do (3). But only for writes, and only for replication-related errors (NotMaster, Write Concern Timeout).

Comment by Siyuan Zhou [ 16/Jun/16 ]

In the discussion with spencer and scotthernandez, we came up with 4 possible solutions.
1. Revert lastVisibleOpTime on mongod if the command fails.
2. Instead of reverting lastVisibleOpTime, add an extra one that remains the same on failure.
3. On mongos, don't advance config optime when we got errors.
4. On mongos, use lastOpCommitted instead.

I'd vote for the last option as its design is simplest and follows the current behavior. For successful writes, the lastVisibleOpTime equals to lastOpCommitted and may be ahead of snapshots anyway.

CC schwerin and milkie.

Generated at Thu Feb 08 04:06:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.