-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.4, 2.4.6
-
Component/s: Sharding
-
None
-
Environment:Debug build
OS X 10.7.3 4x2.8GHz i5 iMac, 12GB RAM, 1TB HD
Ubuntu Linux 3.0.0-16 4xVirtualized cores, 2GB RAM, 256GB HD
-
Sharding
-
ALL
Simple shard test setup:
mongos> db.runCommand( { enablesharding : "shardtest" } ); mongos> db.runCommand( { shardcollection : "shardtest.txtcol", key : {txtkey:1} }); ... mongos> use shardtest switched to db shardtest mongos> show collections shardcol system.indexes txtcol mongos> for (i=0;i<100000;i++) { db.txtcol.insert({txtkey:'asdfasdffdsafdsa', asdf:'asdkghkuwehgkawuhegawegwkrughekurghkauerhgkuhaekruhgkuehrgkuhreg'}); }
Inserts 2 documents every two seconds. Verified via count() and mongostat:
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 0 0 0 0 0 5 0 672m 3.76g 19m 0 0 0|0 5|0 558b 2k 23 sh1 M 01:06:46 2 0 0 0 4 3 0 672m 3.76g 19m 0 0 0|0 5|0 849b 3k 23 sh1 M 01:06:47 0 0 0 0 0 5 0 672m 3.76g 19m 0 0 0|0 5|0 558b 2k 23 sh1 M 01:06:48 2 0 0 0 4 3 1 672m 3.76g 19m 0 0 0|0 5|0 849b 3k 23 sh1 M 01:06:49 0 0 0 0 0 5 0 672m 3.76g 20m 0 0 0|0 5|0 558b 2k 23 sh1 M 01:06:50 2 0 0 0 5 3 0 672m 3.76g 20m 0 0 0|0 5|0 896b 3k 23 sh1 M 01:06:51
When the insert function was changed to use unique keys:
mongos> for (i=0;i<100000;i++) { db.txtcol.insert({txtkey:'asdfasdffdsafdsa'+i, asdf:'asdkghkuwehgkawuhegawegwkrughekurghkauerhgkuhaekruhgkuehrgkuhreg'}); }
mongos slowly ramped up to roughly 2k inserts/second:
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:21 2 0 0 0 4 21 0 672m 3.76g 23m 0 0 0|0 5|0 2k 11k 23 sh1 M 01:10:22 0 0 0 0 0 17 0 672m 3.76g 23m 0 0 0|0 5|0 1k 12k 23 sh1 M 01:10:23 2 0 0 0 6 11 0 672m 3.76g 23m 0 0 0|0 5|0 1k 4k 23 sh1 M 01:10:24 0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:25 2 0 0 0 4 7 0 672m 3.76g 23m 0 0 0|0 5|0 1k 3k 23 sh1 M 01:10:26 0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:27 2 0 0 0 7 7 0 672m 3.76g 23m 0 0 0|0 5|0 1k 4k 23 sh1 M 01:10:28 0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 5|0 62b 1k 23 sh1 M 01:10:29 3 0 0 0 8 11 0 672m 3.76g 23m 0 0 0|0 6|0 2k 5k 24 sh1 M 01:10:30 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 0 0 0 0 0 1 0 672m 3.76g 23m 0 0 0|0 6|0 62b 1k 24 sh1 M 01:10:31 4 0 0 0 10 39 0 672m 3.76g 14m 1.6 0 0|0 6|0 4k 23k 24 sh1 M 01:10:32 0 0 0 0 0 1 0 672m 3.76g 19m 0 0 0|0 6|0 128b 2k 24 sh1 M 01:10:33 4 0 0 0 10 13 0 672m 3.76g 12m 1.6 0 0|0 6|0 2k 6k 25 sh1 M 01:10:34 0 0 0 0 0 1 0 672m 3.76g 19m 0 0 0|0 6|0 62b 1k 25 sh1 M 01:10:35 2 0 0 0 5 7 0 672m 3.76g 23m 0 0 0|0 6|0 1k 4k 25 sh1 M 01:10:36 1 3 0 0 4 11 0 672m 3.77g 23m 0 0 0|0 6|0 1k 6k 26 sh1 M 01:10:37 0 0 0 0 0 5 0 672m 3.77g 23m 0 0 0|0 6|0 558b 2k 26 sh1 M 01:10:38 750 1 0 0 1000 7 0 672m 3.77g 24m 9.6 0 0|0 6|0 170k 614k 26 sh1 M 01:10:39 0 0 0 0 0 5 0 672m 3.77g 24m 0 0 0|0 6|0 558b 2k 26 sh1 M 01:10:40 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 733 0 0 0 1158 3 0 672m 3.77g 25m 12.6 0 0|0 6|0 176k 614k 26 sh1 M 01:10:41 142 0 0 0 67 35 0 672m 3.77g 18m 3.3 0 0|0 5|0 28k 132k 26 sh1 M 01:10:42 0 0 0 0 0 1 0 672m 3.77g 23m 0 0 0|0 5|0 62b 1k 26 sh1 M 01:10:43 728 0 0 0 917 11 0 672m 3.77g 25m 12.9 0 0|0 5|0 165k 605k 26 sh1 M 01:10:44 0 0 0 0 0 1 0 672m 3.77g 25m 0 0 0|0 5|0 62b 1k 26 sh1 M 01:10:45 726 0 0 0 932 7 0 672m 3.77g 15m 12.4 0 0|0 5|0 165k 605k 26 sh1 M 01:10:46 0 0 0 0 0 7 0 672m 3.77g 21m 0 0 0|0 5|0 434b 5k 26 sh1 M 01:10:47 1 2 0 0 4 12 0 672m 3.77g 26m 0 0 0|0 5|0 2k 8k 26 sh1 M 01:10:48 0 0 0 0 0 1 1 672m 3.77g 26m 0 0 0|0 5|0 62b 1k 26 sh1 M 01:10:49 189 0 0 0 151 5 0 672m 3.77g 26m 2.4 0 0|0 0|1 39k 142k 26 sh1 M 01:10:50 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 2087 0 0 0 2088 1 0 672m 3.77g 16m 37.4 0 2|0 2|1 446k 1m 26 sh1 M 01:10:51 1915 0 0 0 2254 35 0 672m 3.77g 16m 34.2 0 0|0 1|0 427k 1m 26 sh1 M 01:10:52 1959 0 0 0 2409 1 0 672m 3.77g 18m 34.5 0 0|0 0|0 440k 1m 26 sh1 M 01:10:53 2002 0 0 0 2288 11 0 672m 3.77g 19m 33.4 0 0|1 0|1 443k 1m 26 sh1 M 01:10:54 2053 0 0 0 2029 1 0 672m 3.77g 19m 33.2 0 1|0 2|0 440k 1m 26 sh1 M 01:10:55 1992 0 0 0 2057 5 0 672m 3.77g 20m 36.3 0 0|0 1|0 431k 1m 26 sh1 M 01:10:56 1987 0 0 0 2536 7 0 672m 3.77g 22m 30.9 0 0|0 0|0 453k 1m 26 sh1 M 01:10:57 1873 0 0 0 2478 5 0 672m 3.77g 23m 32.8 0 0|0 0|0 431k 1m 26 sh1 M 01:10:58 2003 0 0 0 2299 1 0 672m 3.77g 23m 32.7 0 0|1 1|1 444k 1m 26 sh1 M 01:10:59 2124 0 0 0 1852 5 0 672m 3.77g 24m 34.6 0 3|0 3|0 444k 1m 26 sh1 M 01:11:00 insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 1931 0 0 0 2303 1 0 672m 3.77g 25m 30.8 0 4|0 4|1 432k 1m 26 sh1 M 01:11:01
The following log entry appeared every 500ms-2s in the mongos logs:
Fri Mar 23 01:08:11 [conn1] Request::process ns: shardtest.txtcol msg id:187 attempt: 0 Fri Mar 23 01:08:11 [conn1] write: shardtest.txtcol Fri Mar 23 01:08:11 [conn1] server:sh1:sh1/localhost:27017,localhost:27018,localhost:27019,localhost:27020,localhost:27021 { _id: ObjectId('4f6c2eb0e4466c463950f631'), txtkey: "asdfasdffdsafdsa", asdf: "asdkghkuwehgkawuhegawegwkrughekurghkauerhgkuhaekruhgkuehrgkuhreg" } Fri Mar 23 01:08:11 [conn1] about to initiate autosplit: ns:shardtest.txtcol at: sh1:sh1/localhost:27017,localhost:27018,localhost:27019,localhost:27020,localhost:27021 lastmod: 1|0 min: { txtkey: MinKey } max: { txtkey: MaxKey } dataWritten: 252 splitThreshold: 921
dataWritten is the same in all output.
The following appears in the mongod logs:
Fri Mar 23 01:05:41 [conn25] warning: chunk is larger than 1024 bytes because of key { txtkey: "asdfasdffdsafdsa" } Fri Mar 23 01:05:41 [conn25] warning: Finding the split vector for shardtest.txtcol over { txtkey: 1.0 } keyCount: 4 numSplits: 0 lookedAt: 299107 took 2035ms Fri Mar 23 01:05:41 [conn25] command admin.$cmd command: { splitVector: "shardtest.txtcol", keyPattern: { txtkey: 1.0 }, min: { txtkey: MinKey }, max: { txtkey: MaxKey }, maxChunkSizeBytes: 1024, maxSplitPoints: 2, maxChunkObjects: 250000 } ntoreturn:1 reslen:53 2035ms
Going back to the original test script with identical keys (and an empty collection), if the calls to splitIfShould() in strategy_shard.cpp L164 and L330 are commented out, mongos inserts documents with identical keys at over 2k/second:
insert query update delete getmore command flushes mapped vsize res locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 2081 0 0 0 2495 5 0 672m 3.76g 10m 36.5 0 0|1 1|1 456k 1m 17 sh1 M 00:56:50 1939 0 0 0 2844 31 0 672m 3.76g 11m 29.6 0 0|0 1|0 451k 1m 17 sh1 M 00:56:51 2189 0 0 0 2328 5 0 672m 3.76g 12m 34.3 0 0|0 0|0 466k 1m 17 sh1 M 00:56:52 2155 0 0 0 2234 1 0 672m 3.76g 12m 35 0 1|0 1|0 456k 1m 17 sh1 M 00:56:53 2016 0 0 0 2363 5 0 672m 3.76g 13m 31.7 0 4|0 4|1 440k 1m 17 sh1 M 00:56:54 2105 0 0 0 2203 1 0 672m 3.76g 14m 36.2 0 0|0 0|0 446k 1m 17 sh1 M 00:56:55 2139 0 0 0 2416 5 0 672m 3.76g 16m 33.8 0 1|0 2|1 462k 1m 17 sh1 M 00:56:56 1782 0 0 0 1737 1 0 672m 3.76g 16m 27.9 0 0|0 4|0 372k 1m 18 sh1 M 00:56:57
The same test was performed on a virtualized linux machine on the same host, and was consistently seeing 4-5k inserts/sec with the identical shard key test. More profiling can be done to track down the root cause of slowness. I was unable to find a reason for the different behavior on Linux.
- related to
-
SERVER-36394 mongos should reset chunk size tracking information when autosplit = false or splitvector returns too few split points
- Closed