Loading...

Type: Improvement
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
None

Assigned Teams:

Sharding
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Metadata writes are still attempted if a config server is fsyncLocked, which can cause problems with consistency (most likely problems with distlocks getting stuck, but if the fsyncLocked config server cannot be unlocked, then actual inconsistencies when it is forcibly shutdown).

This is different to the behaviour when a config server is down or unreachable, where the metadata write will not be attempted on any config server. This is problematic because users often think "I need to stop writes to config servers, but taking one down is scary, so I'll just fsyncLock it instead", ie. fsyncLock is often perceived by users as a good way of stopping writes without stopping reads — but this is not true for the SyncClusterConnection config protocol, and can cause more harm than good.

It would be good if, for the purposes of writes, SCCC considered an fsyncLocked config server to be unavailable, same as if it were actually down.

The simplest way to do this would probably be for getLastError to return an fsyncLocked bool, which the SCCC protocol can check when it checks for ok: 1 inside ConfigCoordinator::executeBatch().

Repro:

Start sharded cluster with 3 config servers
Shard a collection
Take a config server down

An attempted split fails, as expected:

mongos> db.test.splitAt({_id:1})
{
        "ok" : 0,
        "errmsg" : "not all config servers are up: Not all config servers config:genique:11114,genique:11115,genique:11116 are reachable"
}

Bring the config server back up.
fsyncLock a config server.
Attempt a split, which will block.
Wait 30 seconds.

The split will fail.

mongos> db.test.splitAt({_id:2})
{
        "cause" : "SplitFailed splitChunk failed - cmd: { splitChunk: \"test.test\", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: \"shard01\", splitKeys: [ { _id: 2.0 } ], shardId: \"test.test-_id_1.0\", configdb: \"genique:11114,genique:11115,genique:11116\", epoch: ObjectId('5647c30eab64f42ca072e6ca') } result: { ok: 0.0, errmsg: \"could not acquire collection lock for test.test to split chunk [{ : MinKey },{ : MaxKey }) :: caused by :: exception creating distributed lock test.te...\" }",
        "ok" : 0,
        "errmsg" : "split failed"
}

2015-11-15T10:28:25.874+1100 I COMMAND  [conn5] splitting chunk [{ _id: 1.0 },{ _id: MaxKey }) in collection test.test on shard shard01
2015-11-15T10:28:55.956+1100 W SHARDING [conn5] splitChunk failed - cmd: { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 2.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') } result: { ok: 0.0, errmsg: "could not acquire collection lock for test.test to split chunk [{ : MinKey },{ : MaxKey }) :: caused by :: exception creating distributed lock test.te..." }

2015-11-15T10:28:25.874+1100 I SHARDING [conn2] received splitChunk request: { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 2.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') }
2015-11-15T10:28:50.767+1100 I NETWORK  [LockPinger] SyncClusterConnection connecting to [genique:11114]
2015-11-15T10:28:50.768+1100 I NETWORK  [LockPinger] SyncClusterConnection connecting to [genique:11115]
2015-11-15T10:28:50.768+1100 I NETWORK  [LockPinger] SyncClusterConnection connecting to [genique:11116]
2015-11-15T10:28:55.950+1100 I NETWORK  [conn2] Socket recv() timeout  127.0.1.1:11115
2015-11-15T10:28:55.950+1100 I NETWORK  [conn2] SocketException: remote: 127.0.1.1:11115 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.1.1:11115]
2015-11-15T10:28:55.950+1100 I NETWORK  [conn2] DBClientCursor::init call() failed
2015-11-15T10:28:55.956+1100 W SHARDING [conn2] could not acquire collection lock for test.test to split chunk [{ : MinKey },{ : MaxKey }) :: caused by :: exception creating distributed lock test.test/genique:11112:1447543610:1011412462 :: caused by :: SyncClusterConnection write op failed: genique:11115 (127.0.1.1) failed: {} DBClientBase::findN: transport error: genique:11115 ns: admin.$cmd query: { getlasterror: 1, fsync: 1 }
2015-11-15T10:28:55.956+1100 I COMMAND  [conn2] command admin.$cmd command: splitChunk { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 2.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:434 locks:{} 30081ms

fsyncUnlock the config server.

The collection lock is still stuck, and subsequent metadata operations continue to fail accordingly:

mongos> db.test.splitAt({_id:4})
{
        "cause" : "SplitFailed splitChunk failed - cmd: { splitChunk: \"test.test\", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: \"shard01\", splitKeys: [ { _id: 4.0 } ], shardId: \"test.test-_id_1.0\", configdb: \"genique:11114,genique:11115,genique:11116\", epoch: ObjectId('5647c30eab64f42ca072e6ca') } result: { ok: 0.0, errmsg: \"could not acquire collection lock for test.test to split chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for splitting chunk [{ : MinKey }, { :...\" }",
        "ok" : 0,
        "errmsg" : "split failed"
}

2015-11-15T10:29:21.018+1100 I COMMAND  [conn5] splitting chunk [{ _id: 1.0 },{ _id: MaxKey }) in collection test.test on shard shard01
2015-11-15T10:29:21.019+1100 W SHARDING [conn5] splitChunk failed - cmd: { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 3.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') } result: { ok: 0.0, errmsg: "could not acquire collection lock for test.test to split chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for splitting chunk [{ : MinKey }, { :..." }

2015-11-15T10:29:21.018+1100 I SHARDING [conn2] received splitChunk request: { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 3.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') }
2015-11-15T10:29:21.018+1100 W SHARDING [conn2] could not acquire collection lock for test.test to split chunk [{ : MinKey },{ : MaxKey }) :: caused by :: Lock for splitting chunk [{ : MinKey }, { : MaxKey }) in test.test is taken.

After 15 mins the lock will be forced as usual.

mongos> db.test.splitAt({_id:4})
{ "ok" : 1 }

2015-11-15T10:51:26.698+1100 I COMMAND  [conn5] splitting chunk [{ _id: 1.0 },{ _id: MaxKey }) in collection test.test on shard shard01
2015-11-15T10:51:27.312+1100 I SHARDING [conn5] ChunkManager: time to load chunks for test.test: 0ms sequenceNumber: 6 version: 2|3||5647c30eab64f42ca072e6ca based on: 2|1||5647c30eab64f42ca072e6ca

2015-11-15T10:51:26.698+1100 I SHARDING [conn6] received splitChunk request: { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 4.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') }
2015-11-15T10:51:26.699+1100 I SHARDING [conn6] forcing lock 'test.test/genique:11112:1447543610:1011412462' because elapsed time 1325681 > takeover time 900000
2015-11-15T10:51:26.824+1100 I SHARDING [conn6] lock 'test.test/genique:11112:1447543610:1011412462' successfully forced
2015-11-15T10:51:27.028+1100 I SHARDING [conn6] distributed lock 'test.test/genique:11112:1447543610:1011412462' acquired, ts : 5647c8fe87cb2b46d7bc360a
2015-11-15T10:51:27.028+1100 I SHARDING [conn6] remotely refreshing metadata for test.test based on current shard version 2|0||5647c30eab64f42ca072e6ca, current metadata version is 2|0||5647c30eab64f42ca072e6ca
2015-11-15T10:51:27.028+1100 I SHARDING [conn6] updating metadata for test.test from shard version 2|0||5647c30eab64f42ca072e6ca to shard version 2|1||5647c30eab64f42ca072e6ca
2015-11-15T10:51:27.028+1100 I SHARDING [conn6] collection version was loaded at version 2|1||5647c30eab64f42ca072e6ca, took 0ms
2015-11-15T10:51:27.028+1100 I SHARDING [conn6] splitChunk accepted at version 2|1||5647c30eab64f42ca072e6ca
2015-11-15T10:51:27.164+1100 I SHARDING [conn6] about to log metadata event: { _id: "genique-2015-11-14T23:51:27-5647c8ff87cb2b46d7bc360b", server: "genique", clientAddr: "127.0.0.1:44034", time: new Date(1447545087164), what: "split", ns: "test.test", details: { before: { min: { _id: 1.0 }, max: { _id: MaxKey } }, left: { min: { _id: 1.0 }, max: { _id: 4.0 }, lastmod: Timestamp 2000|2, lastmodEpoch: ObjectId('5647c30eab64f42ca072e6ca') }, right: { min: { _id: 4.0 }, max: { _id: MaxKey }, lastmod: Timestamp 2000|3, lastmodEpoch: ObjectId('5647c30eab64f42ca072e6ca') } } }
2015-11-15T10:51:27.311+1100 I SHARDING [conn6] distributed lock 'test.test/genique:11112:1447543610:1011412462' unlocked.
2015-11-15T10:51:27.311+1100 I COMMAND  [conn6] command admin.$cmd command: splitChunk { splitChunk: "test.test", keyPattern: { _id: 1.0 }, min: { _id: 1.0 }, max: { _id: MaxKey }, from: "shard01", splitKeys: [ { _id: 4.0 } ], shardId: "test.test-_id_1.0", configdb: "genique:11114,genique:11115,genique:11116", epoch: ObjectId('5647c30eab64f42ca072e6ca') } ntoreturn:1 keyUpdates:0 writeConflicts:0 numYields:0 reslen:95 locks:{ Global: { acquireCount: { r: 4, w: 2 } }, MMAPV1Journal: { acquireCount: { r: 1, w: 2 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { R: 1, W: 2 } } } 613ms

Details

Description

Attachments

Activity

People

Dates