Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Gone away
-
2.0.4, 2.4.6
-
None
-
None
-
Debug build
OS X 10.7.3 4x2.8GHz i5 iMac, 12GB RAM, 1TB HD
Ubuntu Linux 3.0.0-16 4xVirtualized cores, 2GB RAM, 256GB HD
-
Sharding
-
ALL
Description
Simple shard test setup:
mongos> db.runCommand( { enablesharding : "shardtest" } );
|
mongos> db.runCommand( { shardcollection : "shardtest.txtcol", key : {txtkey:1} });
|
...
|
mongos> use shardtest
|
switched to db shardtest
|
mongos> show collections
|
shardcol
|
system.indexes
|
txtcol
|
mongos> for (i=0;i<100000;i++) { db.txtcol.insert({txtkey:'asdfasdffdsafdsa', asdf:'asdkghkuwehgkawuhegawegwkrughekurghkauerhgkuhaekruhgkuehrgkuhreg'}); }
|
Inserts 2 documents every two seconds. Verified via count() and mongostat:
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
0 0 0 0 0 5 0 672m 3.76g 19m 0 0 0|0 5|0 558b 2k 23 sh1 M 01:06:46
|
2 0 0 0 4 3 0 672m 3.76g 19m 0 0 0|0 5|0 849b 3k 23 sh1 M 01:06:47
|
0 0 0 0 0 5 0 672m 3.76g 19m 0 0 0|0 5|0 558b 2k 23 sh1 M 01:06:48
|
2 0 0 0 4 3 1 672m 3.76g 19m 0 0 0|0 5|0 849b 3k 23 sh1 M 01:06:49
|
0 0 0 0 0 5 0 672m 3.76g 20m 0 0 0|0 5|0 558b 2k 23 sh1 M 01:06:50
|
2 0 0 0 5 3 0 672m 3.76g 20m 0 0 0|0 5|0 896b 3k 23 sh1 M 01:06:51
|
When the insert function was changed to use unique keys:
mongos> for (i=0;i<100000;i++) { db.txtcol.insert({txtkey:'asdfasdffdsafdsa'+i, asdf:'asdkghkuwehgkawuhegawegwkrughekurghkauerhgkuhaekruhgkuehrgkuhreg'}); }
|
mongos slowly ramped up to roughly 2k inserts/second:
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:21
|
2 0 0 0 4 21 0 672m 3.76g 23m 0 0 0|0 5|0 2k 11k 23 sh1 M 01:10:22
|
0 0 0 0 0 17 0 672m 3.76g 23m 0 0 0|0 5|0 1k 12k 23 sh1 M 01:10:23
|
2 0 0 0 6 11 0 672m 3.76g 23m 0 0 0|0 5|0 1k 4k 23 sh1 M 01:10:24
|
0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:25
|
2 0 0 0 4 7 0 672m 3.76g 23m 0 0 0|0 5|0 1k 3k 23 sh1 M 01:10:26
|
0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:27
|
2 0 0 0 7 7 0 672m 3.76g 23m 0 0 0|0 5|0 1k 4k 23 sh1 M 01:10:28
|
0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:29
|
3 0 0 0 8 11 0 672m 3.76g 23m 0 0 0|0 6|0 2k 5k 24 sh1 M 01:10:30
|
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 6|0 62b 1k 24 sh1 M 01:10:31
|
4 0 0 0 10 39 0 672m 3.76g 14m 1.6 0 0|0 6|0 4k 23k 24 sh1 M 01:10:32
|
0 0 0 0 0 1 0 672m 3.76g 19m 0 0 0|0 6|0 128b 2k 24 sh1 M 01:10:33
|
4 0 0 0 10 13 0 672m 3.76g 12m 1.6 0 0|0 6|0 2k 6k 25 sh1 M 01:10:34
|
0 0 0 0 0 1 0 672m 3.76g 19m 0 0 0|0 6|0 62b 1k 25 sh1 M 01:10:35
|
2 0 0 0 5 7 0 672m 3.76g 23m 0 0 0|0 6|0 1k 4k 25 sh1 M 01:10:36
|
1 3 0 0 4 11 0 672m 3.77g 23m 0 0 0|0 6|0 1k 6k 26 sh1 M 01:10:37
|
0 0 0 0 0 5 0 672m 3.77g 23m 0 0 0|0 6|0 558b 2k 26 sh1 M 01:10:38
|
750 1 0 0 1000 7 0 672m 3.77g 24m 9.6 0 0|0 6|0 170k 614k 26 sh1 M 01:10:39
|
0 0 0 0 0 5 0 672m 3.77g 24m 0 0 0|0 6|0 558b 2k 26 sh1 M 01:10:40
|
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
733 0 0 0 1158 3 0 672m 3.77g 25m 12.6 0 0|0 6|0 176k 614k 26 sh1 M 01:10:41
|
142 0 0 0 67 35 0 672m 3.77g 18m 3.3 0 0|0 5|0 28k 132k 26 sh1 M 01:10:42
|
0 0 0 0 0 1 0 672m 3.77g 23m 0 0 0|0 5|0 62b 1k 26 sh1 M 01:10:43
|
728 0 0 0 917 11 0 672m 3.77g 25m 12.9 0 0|0 5|0 165k 605k 26 sh1 M 01:10:44
|
0 0 0 0 0 1 0 672m 3.77g 25m 0 0 0|0 5|0 62b 1k 26 sh1 M 01:10:45
|
726 0 0 0 932 7 0 672m 3.77g 15m 12.4 0 0|0 5|0 165k 605k 26 sh1 M 01:10:46
|
0 0 0 0 0 7 0 672m 3.77g 21m 0 0 0|0 5|0 434b 5k 26 sh1 M 01:10:47
|
1 2 0 0 4 12 0 672m 3.77g 26m 0 0 0|0 5|0 2k 8k 26 sh1 M 01:10:48
|
0 0 0 0 0 1 1 672m 3.77g 26m 0 0 0|0 5|0 62b 1k 26 sh1 M 01:10:49
|
189 0 0 0 151 5 0 672m 3.77g 26m 2.4 0 0|0 0|1 39k 142k 26 sh1 M 01:10:50
|
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
2087 0 0 0 2088 1 0 672m 3.77g 16m 37.4 0 2|0 2|1 446k 1m 26 sh1 M 01:10:51
|
1915 0 0 0 2254 35 0 672m 3.77g 16m 34.2 0 0|0 1|0 427k 1m 26 sh1 M 01:10:52
|
1959 0 0 0 2409 1 0 672m 3.77g 18m 34.5 0 0|0 0|0 440k 1m 26 sh1 M 01:10:53
|
2002 0 0 0 2288 11 0 672m 3.77g 19m 33.4 0 0|1 0|1 443k 1m 26 sh1 M 01:10:54
|
2053 0 0 0 2029 1 0 672m 3.77g 19m 33.2 0 1|0 2|0 440k 1m 26 sh1 M 01:10:55
|
1992 0 0 0 2057 5 0 672m 3.77g 20m 36.3 0 0|0 1|0 431k 1m 26 sh1 M 01:10:56
|
1987 0 0 0 2536 7 0 672m 3.77g 22m 30.9 0 0|0 0|0 453k 1m 26 sh1 M 01:10:57
|
1873 0 0 0 2478 5 0 672m 3.77g 23m 32.8 0 0|0 0|0 431k 1m 26 sh1 M 01:10:58
|
2003 0 0 0 2299 1 0 672m 3.77g 23m 32.7 0 0|1 1|1 444k 1m 26 sh1 M 01:10:59
|
2124 0 0 0 1852 5 0 672m 3.77g 24m 34.6 0 3|0 3|0 444k 1m 26 sh1 M 01:11:00
|
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
1931 0 0 0 2303 1 0 672m 3.77g 25m 30.8 0 4|0 4|1 432k 1m 26 sh1 M 01:11:01
|
The following log entry appeared every 500ms-2s in the mongos logs:
Fri Mar 23 01:08:11 [conn1] Request::process ns: shardtest.txtcol msg id:187 attempt: 0
|
Fri Mar 23 01:08:11 [conn1] write: shardtest.txtcol
|
Fri Mar 23 01:08:11 [conn1] server:sh1:sh1/localhost:27017,localhost:27018,localhost:27019,localhost:27020,localhost:27021 { _id: ObjectId('4f6c2eb0e4466c463950f631'), txtkey: "asdfasdffdsafdsa", asdf: "asdkghkuwehgkawuhegawegwkrughekurghkauerhgkuhaekruhgkuehrgkuhreg" }
|
Fri Mar 23 01:08:11 [conn1] about to initiate autosplit: ns:shardtest.txtcol at: sh1:sh1/localhost:27017,localhost:27018,localhost:27019,localhost:27020,localhost:27021 lastmod: 1|0 min: { txtkey: MinKey } max: { txtkey: MaxKey } dataWritten: 252 splitThreshold: 921
|
dataWritten is the same in all output.
The following appears in the mongod logs:
Fri Mar 23 01:05:41 [conn25] warning: chunk is larger than 1024 bytes because of key { txtkey: "asdfasdffdsafdsa" }
|
Fri Mar 23 01:05:41 [conn25] warning: Finding the split vector for shardtest.txtcol over { txtkey: 1.0 } keyCount: 4 numSplits: 0 lookedAt: 299107 took 2035ms
|
Fri Mar 23 01:05:41 [conn25] command admin.$cmd command: { splitVector: "shardtest.txtcol", keyPattern: { txtkey: 1.0 }, min: { txtkey: MinKey }, max: { txtkey: MaxKey }, maxChunkSizeBytes: 1024, maxSplitPoints: 2, maxChunkObjects: 250000 } ntoreturn:1 reslen:53 2035ms
|
Going back to the original test script with identical keys (and an empty collection), if the calls to splitIfShould() in strategy_shard.cpp L164 and L330 are commented out, mongos inserts documents with identical keys at over 2k/second:
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
|
2081 0 0 0 2495 5 0 672m 3.76g 10m 36.5 0 0|1 1|1 456k 1m 17 sh1 M 00:56:50
|
1939 0 0 0 2844 31 0 672m 3.76g 11m 29.6 0 0|0 1|0 451k 1m 17 sh1 M 00:56:51
|
2189 0 0 0 2328 5 0 672m 3.76g 12m 34.3 0 0|0 0|0 466k 1m 17 sh1 M 00:56:52
|
2155 0 0 0 2234 1 0 672m 3.76g 12m 35 0 1|0 1|0 456k 1m 17 sh1 M 00:56:53
|
2016 0 0 0 2363 5 0 672m 3.76g 13m 31.7 0 4|0 4|1 440k 1m 17 sh1 M 00:56:54
|
2105 0 0 0 2203 1 0 672m 3.76g 14m 36.2 0 0|0 0|0 446k 1m 17 sh1 M 00:56:55
|
2139 0 0 0 2416 5 0 672m 3.76g 16m 33.8 0 1|0 2|1 462k 1m 17 sh1 M 00:56:56
|
1782 0 0 0 1737 1 0 672m 3.76g 16m 27.9 0 0|0 4|0 372k 1m 18 sh1 M 00:56:57
|
The same test was performed on a virtualized linux machine on the same host, and was consistently seeing 4-5k inserts/sec with the identical shard key test. More profiling can be done to track down the root cause of slowness. I was unable to find a reason for the different behavior on Linux.
Attachments
Issue Links
- related to
-
SERVER-36394 mongos should reset chunk size tracking information when autosplit = false or splitvector returns too few split points
-
- Closed
-