[SERVER-26134] Building sharding config database indexes holds up leaving drain mode Created: 16/Sep/16  Updated: 06/Dec/22  Resolved: 02/Jan/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.3.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: PM-108
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-27854 initiate replicasets faster in replic... Closed
Assigned Teams:
Sharding
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

The config database's indexes are being built while the CSRS config server's primary is still in draining mode, which means it is done under a global X lock. This holds up exiting draining mode and also blocks any incoming requests (such as initial sync).

This is causing initial sync failures of this kind:

[js_test:min_optime_recovery_on_failed_move_chunk_commit] 2016-09-16T02:42:21.061+0000 c22013| 2016-09-16T02:42:21.061+0000 I -        [initial sync] Fatal assertion 40088 ExceededTimeLimit: Operation timed out, request was RemoteCommand 42 -- target:WIN-K6DURTQEFVN:22011 db:local expDate:2016-09-16T02:42:11.640+0000 cmd:{ getMore: 17405408007, collection: "oplog.rs", maxTimeMS: 5000, term: 0, lastKnownCommittedOpTime: { ts: Timestamp 0|0, t: -1 } } at src\mongo\db\repl\replication_coordinator_impl.cpp 614

Note that the getmore command below took 18 seconds to acquire the global lock:

c22011| 2016-09-16T02:42:01.843+0000 I REPL     [ReplicationExecutor] My optime is most up-to-date, skipping catch-up and completing transition to primary.
c22011| 2016-09-16T02:42:05.900+0000 I INDEX    [rsSync] build index on: config.chunks properties: { v: 2, unique: true, key: { ns: 1, min: 1 }, name: "ns_1_min_1", ns: "config.chunks" }
...
c22011| 2016-09-16T02:42:18.855+0000 I INDEX    [rsSync] build index done.  scanned 0 total records. 2 secs
c22011| 2016-09-16T02:42:21.043+0000 I SHARDING [Balancer] CSRS balancer is starting
c22011| 2016-09-16T02:42:21.044+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted
c22011| 2016-09-16T02:42:21.049+0000 I COMMAND  [conn7] command local.oplog.rs command: getMore { getMore: 17405408007, collection: "oplog.rs", maxTimeMS: 5000, term: 0, lastKnownCommittedOpTime: { ts: Timestamp 0|0, t: -1 } } originatingCommand: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1473993702000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 0 } planSummary: COLLSCAN cursorid:17405408007 keysExamined:0 docsExamined:0 numYields:0 nreturned:0 reslen:352 locks:{ Global: { acquireCount: { r: 4 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 18627803 } }, Database: { acquireCount: { r: 2 } }, oplog: { acquireCount: { r: 2 } } } protocol:op_command 19408ms
c22011| 2016-09-16T02:42:31.102+0000 I SHARDING [Balancer] CSRS balancer is now stopped



 Comments   
Comment by Sheeri Cabral (Inactive) [ 02/Jan/20 ]

wontfix and there's a workaround for the testing issue.

Comment by Judah Schvimer [ 30/Jan/17 ]

I filed SERVER-27854 for the workaround.

Comment by Kaloian Manassiev [ 30/Jan/17 ]

Unfortunately, we have no plans of fixing this any time soon. At the time when these indexes are built there should not be any data in the config database, so we aren't considering is as a bug in practice. I understand that it is making testing difficult, but fixing it is not trivial.

Comment by Judah Schvimer [ 30/Jan/17 ]

kaloian.manassiev, Is there any plan to address this ticket in the near future if we do not do the above workaround? I would prefer to fix this rather than work around it, but if the fix is not going to happen for a while, it is worth doing the workaround.

Comment by Max Hirschhorn [ 27/Jan/17 ]

judah.schvimer, spencer, milkie, is there consensus on whether we should work around this issue in our testing infrastructure for now by changing replica sets spawned by resmoke.py to do a reconfig after the primary of a 1-node replica set is elected? (akin to SERVER-20844)

Comment by Judah Schvimer [ 24/Jan/17 ]

Yes, I think changing this section of replicaset.py would fix the remaining cases. I am not aware of any other ways that we start up replicasets in our tests.

That said, if there's a way to fix this problem by not holding a global lock for as long, that would be a much better fix.

Comment by Spencer Brody (Inactive) [ 24/Jan/17 ]

judah.schvimer If we changed resmoke.py to do the same thing that the shell does in SERVER-20844, would that fix it?

Comment by Judah Schvimer [ 23/Jan/17 ]

There are now examples of this occurring in ways that SERVER-20844 doesn't fix (such as in the python test fixtures). We no longer can close this with SERVER-20844.

Comment by Eric Milkie [ 15/Dec/16 ]

I agree with Judah. Implementing SERVER-20844 would have many benefits, including fixing this ticket.

Comment by Judah Schvimer [ 15/Dec/16 ]

I think SERVER-20844 could prevent this from manifesting in our tests. This occurs because secondaries do initial sync before the primary has finished transitioning to PRIMARY. SERVER-20844 would not start initial sync on the secondaries until the primary was established and had created these indexes.

Comment by Eric Milkie [ 27/Sep/16 ]

I'm not sure this is the root of the problem. There are certainly other operations that can occur during drain mode, such as foreground index builds for user collections with data, that would take longer than the config server's index builds on empty collections. Perhaps the real solution here is to enhance index builds to run concurrently with other replicated ops, and produce a "commit" operation that actually commits the index and makes it visible. Then index builds wouldn't necessarily block replication.

Generated at Thu Feb 08 04:11:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.