[SERVER-18663] MongoDB Thinks Chunk Has 1284 Petabytes in it Created: 26/May/15  Updated: 06/Dec/22  Resolved: 14/Dec/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Patrick White Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File fs.chunks.output.json    
Assigned Teams:
Sharding
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

No repro, but exists in currently running system. See SO here:

http://dba.stackexchange.com/questions/98836/mongo-shard-with-negative-average-size

Here's output from a move shard:

{
	"cause" : {
		"chunkTooBig" : true,
		"estimatedChunkSize" : NumberLong("1284608926708044072"),
		"ok" : 0,
		"errmsg" : "chunk too big to move",
		"$gleStats" : {
			"lastOpTime" : Timestamp(0, 0),
			"electionId" : ObjectId("553034c33e622e6efa8c0a6c")
		}
	},
	"ok" : 0,
	"errmsg" : "move failed"
}



 Comments   
Comment by Crystal Horn [ 14/Dec/15 ]

No current way to reproduce we will re-open if we see this is still happening.

Comment by Ramon Fernandez Marina [ 04/Jun/15 ]

Thanks for the update pat.white@synata.com, and glad to hear your cluster is back in good shape. It may be very hard to track down the root cause of this issue without being able to reproduce it, but we're going to keep the ticket open to investigate further.

Regards,
Ramón.

Comment by Patrick White [ 02/Jun/15 ]

p.s. You can close the ticket (but, still a good bug to fix)

Patrick White

Synata | Chief Executive Officer

pat.white@synata.com

www.synata.com

831.601.9288

Comment by Patrick White [ 02/Jun/15 ]

Draining completed successfully! Thanks for the help, guys!
-PW

Patrick White

Synata | Chief Executive Officer

pat.white@synata.com

www.synata.com

831.601.9288

Comment by David Hows [ 02/Jun/15 ]

Hi Patrick,

Good to hear that the reboot had a positive impact.

Assuming that sh_4 is the shard being drained, then I would expect this should help. One of the checks on the "from" shard uses the same internal metrics that are output with the stats command.

Keep us posted as to how the draining and migrations go.

  • David
Comment by Alexander Gorrod [ 02/Jun/15 ]

Thanks pat.white@synata.com. I'm glad we've got your cluster up and running again - hopefully the node will drain as expected now.

We will do a review of the code that tracks file sizes in MongoDB and look for places where it could become negative.

Comment by Patrick White [ 01/Jun/15 ]

Here are stats after service restart...I think it looks better, but I also
don't think people on my team will be very happy with "reboot fixed it".
Also, it will be interesting to see if this fixes the shard-drain issue.

FROM PRIMARY:
{
"ns" : "Files.fs.chunks",
"count" : 9734,
"size" : 992525019,
"avgObjSize" : 101964,
"storageSize" : 436697014272,
"capped" : false,
"wiredTiger" : {
"metadata" :

{ "formatVersion" : 1 }

,
"creationString" :
"allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.26491=(addr=\"01e4065612e381e46f16b4fbe4061cfe4184e4fa90ebdfe4061e3e0f84e4073b9312808080e565ad2b7fc0e440f52fc0\",order=26491,time=1432931402,size=1089818624,write_gen=4055316)),checkpoint_lsn=(11997,75238528),checksum=uncompressed,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=14,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=1MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
"type" : "file",
"uri" : "statistics:table:collection-11--29455071926817772",
"LSM" :

{ "bloom filters in the LSM tree" : 0, "bloom filter false positives" : 0, "bloom filter hits" : 0, "bloom filter misses" : 0, "bloom filter pages evicted from cache" : 0, "bloom filter pages read into cache" : 0, "total size of bloom filters" : 0, "sleep for LSM checkpoint throttle" : 0, "chunks in the LSM tree" : 0, "highest merge generation in the LSM tree" : 0, "queries that could have benefited from a Bloom filter that did not exist" : 0, "sleep for LSM merge throttle" : 0 }

,
"block-manager" :

{ "file allocation unit size" : 4096, "blocks allocated" : 0, "checkpoint size" : 1089818624, "allocations requiring file extension" : 0, "blocks freed" : 0, "file magic number" : 120897, "file major version number" : 1, "minor version number" : 0, "file bytes available for reuse" : 435715661824, "file size in bytes" : 436697014272 }

,
"btree" :

{ "column-store variable-size deleted values" : 0, "column-store fixed-size leaf pages" : 0, "column-store internal pages" : 0, "column-store variable-size leaf pages" : 0, "pages rewritten by compaction" : 0, "number of key/value pairs" : 0, "fixed-record size" : 0, "maximum tree depth" : 5, "maximum internal page key size" : 368, "maximum internal page size" : 4096, "maximum leaf page key size" : 3276, "maximum leaf page size" : 32768, "maximum leaf page value size" : 1048576, "overflow pages" : 0, "row-store internal pages" : 0, "row-store leaf pages" : 0 }

,
"cache" :

{ "bytes read into cache" : 993095599, "bytes written from cache" : 0, "checkpoint blocked page eviction" : 0, "unmodified pages evicted" : 0, "page split during eviction deepened the tree" : 0, "modified pages evicted" : 0, "data source pages selected for eviction unable to be evicted" : 0, "hazard pointer blocked page eviction" : 0, "internal pages evicted" : 0, "pages split during eviction" : 0, "in-memory page splits" : 0, "overflow values cached in memory" : 0, "pages read into cache" : 8256, "overflow pages read into cache" : 0, "pages written from cache" : 0 }

,
"compression" :

{ "raw compression call failed, no additional data available" : 0, "raw compression call failed, additional data available" : 0, "raw compression call succeeded" : 0, "compressed pages read" : 3505, "compressed pages written" : 0, "page written failed to compress" : 0, "page written was too small to compress" : 0 }

,
"cursor" :

{ "create calls" : 2, "insert calls" : 0, "bulk-loaded cursor-insert calls" : 0, "cursor-insert key and value bytes inserted" : 0, "next calls" : 0, "prev calls" : 9735, "remove calls" : 0, "cursor-remove key bytes removed" : 0, "reset calls" : 9735, "search calls" : 9734, "search near calls" : 0, "update calls" : 0, "cursor-update value bytes updated" : 0 }

,
"reconciliation" :

{ "dictionary matches" : 0, "internal page multi-block writes" : 0, "leaf page multi-block writes" : 0, "maximum blocks required for a page" : 0, "internal-page overflow keys" : 0, "leaf-page overflow keys" : 0, "overflow values written" : 0, "pages deleted" : 0, "page checksum matches" : 0, "page reconciliation calls" : 0, "page reconciliation calls for eviction" : 0, "leaf page key bytes discarded using prefix compression" : 0, "internal page key bytes discarded using suffix compression" : 0 }

,
"session" :

{ "object compaction" : 0, "open cursor count" : 2 }

,
"transaction" :

{ "update conflicts" : 0 }

},
"nindexes" : 2,
"totalIndexSize" : 72163328,
"indexSizes" :

{ "_id_" : 35876864, "files_id_1_n_1" : 36286464 }

,
"ok" : 1
}

FROM SECONDARY:
{
"ns" : "Files.fs.chunks",
"count" : 9746,
"size" : 994359839,
"avgObjSize" : 102027,
"storageSize" : 436701138944,
"capped" : false,
"wiredTiger" : {
"metadata" :

{ "formatVersion" : 1 }

,
"creationString" :
"allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.26778=(addr=\"01e406567aeb81e470d477d0e4061ca5cf84e4021c2645e4062a90d184e4e126138a808080e565ad6a6fc0e44124efc0\",order=26778,time=1432931385,size=1092947968,write_gen=3943704)),checkpoint_lsn=(12075,70585344),checksum=uncompressed,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=14,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=1MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
"type" : "file",
"uri" : "statistics:table:collection-11-8816732648213650014",
"LSM" :

{ "bloom filters in the LSM tree" : 0, "bloom filter false positives" : 0, "bloom filter hits" : 0, "bloom filter misses" : 0, "bloom filter pages evicted from cache" : 0, "bloom filter pages read into cache" : 0, "total size of bloom filters" : 0, "sleep for LSM checkpoint throttle" : 0, "chunks in the LSM tree" : 0, "highest merge generation in the LSM tree" : 0, "queries that could have benefited from a Bloom filter that did not exist" : 0, "sleep for LSM merge throttle" : 0 }

,
"block-manager" :

{ "file allocation unit size" : 4096, "blocks allocated" : 0, "checkpoint size" : 1092947968, "allocations requiring file extension" : 0, "blocks freed" : 0, "file magic number" : 120897, "file major version number" : 1, "minor version number" : 0, "file bytes available for reuse" : 435717832704, "file size in bytes" : 436701138944 }

,
"btree" :

{ "column-store variable-size deleted values" : 0, "column-store fixed-size leaf pages" : 0, "column-store internal pages" : 0, "column-store variable-size leaf pages" : 0, "pages rewritten by compaction" : 0, "number of key/value pairs" : 0, "fixed-record size" : 0, "maximum tree depth" : 6, "maximum internal page key size" : 368, "maximum internal page size" : 4096, "maximum leaf page key size" : 3276, "maximum leaf page size" : 32768, "maximum leaf page value size" : 1048576, "overflow pages" : 0, "row-store internal pages" : 0, "row-store leaf pages" : 0 }

,
"cache" :

{ "bytes read into cache" : 994931799, "bytes written from cache" : 0, "checkpoint blocked page eviction" : 0, "unmodified pages evicted" : 0, "page split during eviction deepened the tree" : 0, "modified pages evicted" : 0, "data source pages selected for eviction unable to be evicted" : 0, "hazard pointer blocked page eviction" : 0, "internal pages evicted" : 0, "pages split during eviction" : 0, "in-memory page splits" : 0, "overflow values cached in memory" : 0, "pages read into cache" : 8278, "overflow pages read into cache" : 0, "pages written from cache" : 0 }

,
"compression" :

{ "raw compression call failed, no additional data available" : 0, "raw compression call failed, additional data available" : 0, "raw compression call succeeded" : 0, "compressed pages read" : 3504, "compressed pages written" : 0, "page written failed to compress" : 0, "page written was too small to compress" : 0 }

,
"cursor" :

{ "create calls" : 2, "insert calls" : 0, "bulk-loaded cursor-insert calls" : 0, "cursor-insert key and value bytes inserted" : 0, "next calls" : 0, "prev calls" : 9747, "remove calls" : 0, "cursor-remove key bytes removed" : 0, "reset calls" : 9747, "search calls" : 9746, "search near calls" : 0, "update calls" : 0, "cursor-update value bytes updated" : 0 }

,
"reconciliation" :

{ "dictionary matches" : 0, "internal page multi-block writes" : 0, "leaf page multi-block writes" : 0, "maximum blocks required for a page" : 0, "internal-page overflow keys" : 0, "leaf-page overflow keys" : 0, "overflow values written" : 0, "pages deleted" : 0, "page checksum matches" : 0, "page reconciliation calls" : 0, "page reconciliation calls for eviction" : 0, "leaf page key bytes discarded using prefix compression" : 0, "internal page key bytes discarded using suffix compression" : 0 }

,
"session" :

{ "object compaction" : 0, "open cursor count" : 2 }

,
"transaction" :

{ "update conflicts" : 0 }

},
"nindexes" : 2,
"totalIndexSize" : 72478720,
"indexSizes" :

{ "_id_" : 36069376, "files_id_1_n_1" : 36409344 }

,
"ok" : 1
}

Patrick White

Synata | Chief Executive Officer

pat.white@synata.com

www.synata.com

831.601.9288

Comment by Patrick White [ 01/Jun/15 ]

Hi David,
I can't login to Jira or MMS right now (maybe something is wrong with your
SSO stuff?), but see below:

FROM PRIMARY:

{
"ns" : "Files.fs.chunks",
"count" : 3067,
"size" : -81949,
"avgObjSize" : -1723868544,
"storageSize" : 436697014272,
"capped" : false,
"wiredTiger" : {
"metadata" : {
"formatVersion" : 1
},
"creationString" :
"allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.26491=(addr=\"01e4065612e381e46f16b4fbe4061cfe4184e4fa90ebdfe4061e3e0f84e4073b9312808080e565ad2b7fc0e440f52fc0\",order=26491,time=1432931402,size=1089818624,write_gen=4055316)),checkpoint_lsn=(11997,75238528),checksum=uncompressed,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=14,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=1MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
"type" : "file",
"uri" : "statistics:table:collection-11--29455071926817772",
"LSM" : {
"bloom filters in the LSM tree" : 0,
"bloom filter false positives" : 0,
"bloom filter hits" : 0,
"bloom filter misses" : 0,
"bloom filter pages evicted from cache" : 0,
"bloom filter pages read into cache" : 0,
"total size of bloom filters" : 0,
"sleep for LSM checkpoint throttle" : 0,
"chunks in the LSM tree" : 0,
"highest merge generation in the LSM tree" : 0,
"queries that could have benefited from a Bloom filter that did not exist"
: 0,
"sleep for LSM merge throttle" : 0
},
"block-manager" : {
"file allocation unit size" : 4096,
"blocks allocated" : 244176,
"checkpoint size" : 1089818624,
"allocations requiring file extension" : 149112,
"blocks freed" : 2838562,
"file magic number" : 120897,
"file major version number" : 1,
"minor version number" : 0,
"file bytes available for reuse" : 435715661824,
"file size in bytes" : 436697014272
},
"btree" : {
"column-store variable-size deleted values" : 0,
"column-store fixed-size leaf pages" : 0,
"column-store internal pages" : 0,
"column-store variable-size leaf pages" : 0,
"pages rewritten by compaction" : 0,
"number of key/value pairs" : 0,
"fixed-record size" : 0,
"maximum tree depth" : 8,
"maximum internal page key size" : 368,
"maximum internal page size" : 4096,
"maximum leaf page key size" : 3276,
"maximum leaf page size" : 32768,
"maximum leaf page value size" : 1048576,
"overflow pages" : 0,
"row-store internal pages" : 0,
"row-store leaf pages" : 0
},
"cache" : {
"bytes read into cache" : 510121213092,
"bytes written from cache" : 29124342867,
"checkpoint blocked page eviction" : 7,
"unmodified pages evicted" : 220984,
"page split during eviction deepened the tree" : 0,
"modified pages evicted" : 2458021,
"data source pages selected for eviction unable to be evicted" : 14725,
"hazard pointer blocked page eviction" : 4805,
"internal pages evicted" : 24310,
"pages split during eviction" : 3125,
"in-memory page splits" : 4,
"overflow values cached in memory" : 0,
"pages read into cache" : 3032199,
"overflow pages read into cache" : 0,
"pages written from cache" : 223216
},
"compression" : {
"raw compression call failed, no additional data available" : 0,
"raw compression call failed, additional data available" : 0,
"raw compression call succeeded" : 0,
"compressed pages read" : 1035750,
"compressed pages written" : 58309,
"page written failed to compress" : 115951,
"page written was too small to compress" : 48956
},
"cursor" : {
"create calls" : 143,
"insert calls" : 179776,
"bulk-loaded cursor-insert calls" : 0,
"cursor-insert key and value bytes inserted" : 27455639950,
"next calls" : 1442,
"prev calls" : 1,
"remove calls" : 3010805,
"cursor-remove key bytes removed" : 11964877,
"reset calls" : 12054309,
"search calls" : 11874048,
"search near calls" : 398,
"update calls" : 0,
"cursor-update value bytes updated" : 0
},
"reconciliation" : {
"dictionary matches" : 0,
"internal page multi-block writes" : 1466,
"leaf page multi-block writes" : 4040,
"maximum blocks required for a page" : 81,
"internal-page overflow keys" : 0,
"leaf-page overflow keys" : 0,
"overflow values written" : 0,
"pages deleted" : 2789449,
"page checksum matches" : 108390,
"page reconciliation calls" : 2853815,
"page reconciliation calls for eviction" : 535554,
"leaf page key bytes discarded using prefix compression" : 0,
"internal page key bytes discarded using suffix compression" : 192768
},
"session" : {
"object compaction" : 0,
"open cursor count" : 143
},
"transaction" : {
"update conflicts" : 0
}
},
"nindexes" : 2,
"totalIndexSize" : 72163328,
"indexSizes" : {
"_id_" : 35876864,
"files_id_1_n_1" : 36286464
},
"ok" : 1,
"$gleStats" : {
"lastOpTime" : Timestamp(0, 0),
"electionId" : ObjectId("553034c33e622e6efa8c0a6c")
}
}

FROM SECONDARY:

{
"ns" : "Files.fs.chunks",
"count" : 3664,
"size" : -81949,
"avgObjSize" : 994031709,
"storageSize" : 436701138944,
"capped" : false,
"wiredTiger" : {
"metadata" : {
"formatVersion" : 1
},
"creationString" :
"allocation_size=4KB,app_metadata=(formatVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.26778=(addr=\"01e406567aeb81e470d477d0e4061ca5cf84e4021c2645e4062a90d184e4e126138a808080e565ad6a6fc0e44124efc0\",order=26778,time=1432931385,size=1092947968,write_gen=3943704)),checkpoint_lsn=(12075,70585344),checksum=uncompressed,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=14,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=1MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
"type" : "file",
"uri" : "statistics:table:collection-11-8816732648213650014",
"LSM" : {
"bloom filters in the LSM tree" : 0,
"bloom filter false positives" : 0,
"bloom filter hits" : 0,
"bloom filter misses" : 0,
"bloom filter pages evicted from cache" : 0,
"bloom filter pages read into cache" : 0,
"total size of bloom filters" : 0,
"sleep for LSM checkpoint throttle" : 0,
"chunks in the LSM tree" : 0,
"highest merge generation in the LSM tree" : 0,
"queries that could have benefited from a Bloom filter that did not exist"
: 0,
"sleep for LSM merge throttle" : 0
},
"block-manager" : {
"file allocation unit size" : 4096,
"blocks allocated" : 473280,
"checkpoint size" : 1092947968,
"allocations requiring file extension" : 328185,
"blocks freed" : 2866542,
"file magic number" : 120897,
"file major version number" : 1,
"minor version number" : 0,
"file bytes available for reuse" : 435717832704,
"file size in bytes" : 436701138944
},
"btree" : {
"column-store variable-size deleted values" : 0,
"column-store fixed-size leaf pages" : 0,
"column-store internal pages" : 0,
"column-store variable-size leaf pages" : 0,
"pages rewritten by compaction" : 0,
"number of key/value pairs" : 0,
"fixed-record size" : 0,
"maximum tree depth" : 8,
"maximum internal page key size" : 368,
"maximum internal page size" : 4096,
"maximum leaf page key size" : 3276,
"maximum leaf page size" : 32768,
"maximum leaf page value size" : 1048576,
"overflow pages" : 0,
"row-store internal pages" : 0,
"row-store leaf pages" : 0
},
"cache" : {
"bytes read into cache" : 492866375480,
"bytes written from cache" : 63389108438,
"checkpoint blocked page eviction" : 104,
"unmodified pages evicted" : 91647,
"page split during eviction deepened the tree" : 0,
"modified pages evicted" : 2431492,
"data source pages selected for eviction unable to be evicted" : 20260,
"hazard pointer blocked page eviction" : 10633,
"internal pages evicted" : 26835,
"pages split during eviction" : 15602,
"in-memory page splits" : 1,
"overflow values cached in memory" : 0,
"pages read into cache" : 2916991,
"overflow pages read into cache" : 0,
"pages written from cache" : 450444
},
"compression" : {
"raw compression call failed, no additional data available" : 0,
"raw compression call failed, additional data available" : 0,
"raw compression call succeeded" : 0,
"compressed pages read" : 1022012,
"compressed pages written" : 125271,
"page written failed to compress" : 270198,
"page written was too small to compress" : 54975
},
"cursor" : {
"create calls" : 240,
"insert calls" : 411131,
"bulk-loaded cursor-insert calls" : 0,
"cursor-insert key and value bytes inserted" : 59263517645,
"next calls" : 0,
"prev calls" : 1,
"remove calls" : 3019682,
"cursor-remove key bytes removed" : 12000369,
"reset calls" : 9722249,
"search calls" : 9311055,
"search near calls" : 0,
"update calls" : 0,
"cursor-update value bytes updated" : 0
},
"reconciliation" : {
"dictionary matches" : 0,
"internal page multi-block writes" : 2494,
"leaf page multi-block writes" : 16353,
"maximum blocks required for a page" : 121,
"internal-page overflow keys" : 0,
"leaf-page overflow keys" : 0,
"overflow values written" : 0,
"pages deleted" : 2797826,
"page checksum matches" : 195534,
"page reconciliation calls" : 2881652,
"page reconciliation calls for eviction" : 485082,
"leaf page key bytes discarded using prefix compression" : 0,
"internal page key bytes discarded using suffix compression" : 408037
},
"session" : {
"object compaction" : 0,
"open cursor count" : 240
},
"transaction" : {
"update conflicts" : 0
}
},
"nindexes" : 2,
"totalIndexSize" : 72478720,
"indexSizes" : {
"_id_" : 36069376,
"files_id_1_n_1" : 36409344
},
"ok" : 1
}

I'll try restarting the services, let you know if that fixes stuff

Patrick White

Synata | Chief Executive Officer

pat.white@synata.com

www.synata.com

831.601.9288

Comment by David Hows [ 01/Jun/15 ]

Hi Patrick,

Sorry to jump in like this. Would you be able to run the same stats command you ran before on the MongoS on the shard (Files.fs.chunks.stats()).

The expectation here is that it should produce the same output as that for sh_4 in your already attached output. This is just to rule out that there is no communication problem between the MongoS and shard primary and that it is indeed an issue with the dataSize.

Additionally, if the shards are replica sets (rather than stand-alone members) would you be able to run the status command again on one of the secondaries? Again, just checking to see how isolated the issue is.

Once we know where the issue is we can take a few quick steps to get things back in order (likely this will just be a reboot of the sh_4 member showing the problem) but we will need to check what is going on first to be sure we don't make a bigger mess.

Thanks!
Dave

Comment by Patrick White [ 01/Jun/15 ]

What command would you like me to run, exactly?

On Sun, May 31, 2015 at 10:14 PM, Alexander Gorrod (JIRA) <jira@mongodb.org>

Patrick White

Synata | Chief Executive Officer

pat.white@synata.com

www.synata.com

831.601.9288

Comment by Alexander Gorrod [ 01/Jun/15 ]

pat.white@synata.com Thanks for the additional information. Is it possible for you to connect to the shard reporting a negative size directly and sending us the collection stats for all collections on that shard?

It would also be interesting to know whether the behvior persists across restarts of the shard (i.e: shutting down and restarting the mongod on the problem shard).

Comment by Patrick White [ 29/May/15 ]

1. Version 3.0.1 with Wired Tiger
2. Ubuntu 14.04
3. See below:

rw-rr- 1 mongodb mongodb 293 Apr 8 20:37 automation-mongod.conf
rw-rr- 1 mongodb mongodb 16384 Apr 8 20:38 collection-0--29455071926817772.wt
rw-rr- 1 mongodb mongodb 436697014272 May 13 20:09 collection-11--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 Apr 8 20:38 collection-13--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 Apr 8 20:38 collection-15--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 Apr 8 20:38 collection-17--29455071926817772.wt
rw-rr- 1 mongodb mongodb 29506945024 Apr 19 06:12 collection-19--29455071926817772.wt
rw-rr- 1 mongodb mongodb 1861304320 Apr 18 09:03 collection-21--29455071926817772.wt
rw-rr- 1 mongodb mongodb 36864 Apr 8 20:39 collection-2--29455071926817772.wt
rw-rr- 1 mongodb mongodb 36864 Apr 8 20:38 collection-4--29455071926817772.wt
rw-rr- 1 mongodb mongodb 32768 Apr 8 20:38 collection-6--29455071926817772.wt
rw-rr- 1 mongodb mongodb 52923289600 May 29 18:12 collection-8--29455071926817772.wt
rw-rr- 1 mongodb mongodb 36864 Apr 16 22:16 collection-9--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-10--29455071926817772.wt
rw-rr- 1 mongodb mongodb 35876864 May 13 20:09 index-12--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-1--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-14--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-16--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 Apr 8 20:38 index-18--29455071926817772.wt
rw-rr- 1 mongodb mongodb 64753664 Apr 19 06:12 index-20--29455071926817772.wt
rw-rr- 1 mongodb mongodb 53354496 Apr 18 09:03 index-22--29455071926817772.wt
rw-rr- 1 mongodb mongodb 36286464 May 13 20:09 index-23--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-24--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 Apr 9 01:04 index-25--29455071926817772.wt
rw-rr- 1 mongodb mongodb 28950528 Apr 19 06:12 index-26--29455071926817772.wt
rw-rr- 1 mongodb mongodb 246194176 Apr 19 06:12 index-27--29455071926817772.wt
rw-rr- 1 mongodb mongodb 20799488 Apr 19 06:12 index-28--29455071926817772.wt
rw-rr- 1 mongodb mongodb 147456 Apr 18 09:03 index-29--29455071926817772.wt
rw-rr- 1 mongodb mongodb 98304 Apr 18 09:03 index-30--29455071926817772.wt
rw-rr- 1 mongodb mongodb 135168 Apr 18 09:03 index-31--29455071926817772.wt
rw-rr- 1 mongodb mongodb 135168 Apr 18 09:03 index-32--29455071926817772.wt
rw-rr- 1 mongodb mongodb 36864 Apr 8 20:39 index-3--29455071926817772.wt
rw-rr- 1 mongodb mongodb 53248 Apr 18 09:03 index-33--29455071926817772.wt
rw-rr- 1 mongodb mongodb 151552 Apr 18 09:03 index-34--29455071926817772.wt
rw-rr- 1 mongodb mongodb 122880 Apr 18 09:03 index-35--29455071926817772.wt
rw-rr- 1 mongodb mongodb 94208 Apr 18 09:03 index-36--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-5--29455071926817772.wt
rw-rr- 1 mongodb mongodb 16384 May 29 19:05 index-7--29455071926817772.wt
drwxr-xr-x 2 mongodb mongodb 4096 Apr 23 01:11 journal
rw-rr- 1 mongodb mongodb 36864 Apr 8 20:38 _mdb_catalog.wt
rw-rr- 1 mongodb mongodb 3036034 May 29 19:05 mongodb.log
rw-rr- 1 mongodb mongodb 1817 Mar 8 20:32 mongodb.log.2015-03-08T20-32-52.gz
rw-rr- 1 mongodb mongodb 120064 Mar 9 01:05 mongodb.log.2015-03-09T01-06-07.gz
rw-rr- 1 mongodb mongodb 1393 Mar 9 01:06 mongodb.log.2015-03-09T01-06-54.gz
rw-rr- 1 mongodb mongodb 6460 Mar 9 01:30 mongodb.log.2015-03-09T01-30-48.gz
rw-rr- 1 mongodb mongodb 13048356 Mar 9 18:44 mongodb.log.2015-03-09T18-44-39.gz
rw-rr- 1 mongodb mongodb 2821818 Mar 10 18:44 mongodb.log.2015-03-10T18-44-40.gz
rw-rr- 1 mongodb mongodb 898455 Mar 11 18:44 mongodb.log.2015-03-11T18-44-42.gz
rw-rr- 1 mongodb mongodb 7756794 Mar 12 18:44 mongodb.log.2015-03-12T18-44-43.gz
rw-rr- 1 mongodb mongodb 9252647 Mar 13 18:44 mongodb.log.2015-03-13T18-44-43.gz
rw-rr- 1 mongodb mongodb 5890777 Mar 14 18:44 mongodb.log.2015-03-14T18-44-43.gz
rw-rr- 1 mongodb mongodb 2762103 Mar 15 18:44 mongodb.log.2015-03-15T18-44-44.gz
rw-rr- 1 mongodb mongodb 4692206 Mar 16 18:44 mongodb.log.2015-03-16T18-44-46.gz
rw-rr- 1 mongodb mongodb 5244809 Mar 17 18:44 mongodb.log.2015-03-17T18-44-47.gz
rw-rr- 1 mongodb mongodb 3774006 Mar 18 15:10 mongodb.log.2015-03-18T15-10-28.gz
rw-rr- 1 mongodb mongodb 1702043 Mar 19 15:10 mongodb.log.2015-03-19T15-10-30.gz
rw-rr- 1 mongodb mongodb 474858 Mar 19 20:52 mongodb.log.2015-03-19T20-52-38.gz
rw-rr- 1 mongodb mongodb 275 Mar 19 20:52 mongodb.log.2015-03-19T20-52-40.gz
rw-rr- 1 mongodb mongodb 867861 Mar 20 20:53 mongodb.log.2015-03-20T20-53-17.gz
rw-rr- 1 mongodb mongodb 241020 Mar 21 20:53 mongodb.log.2015-03-21T20-53-19.gz
rw-rr- 1 mongodb mongodb 501061 Mar 22 20:53 mongodb.log.2015-03-22T20-53-21.gz
rw-rr- 1 mongodb mongodb 424337 Mar 23 20:53 mongodb.log.2015-03-23T20-53-23.gz
rw-rr- 1 mongodb mongodb 243501 Mar 24 01:16 mongodb.log.2015-03-24T01-16-41.gz
rw-rr- 1 mongodb mongodb 273 Mar 24 01:16 mongodb.log.2015-03-24T01-16-43.gz
rw-rr- 1 mongodb mongodb 1760 Mar 24 01:16 mongodb.log.2015-03-24T01-20-01.gz
rw-rr- 1 mongodb mongodb 892061 Mar 25 01:19 mongodb.log.2015-03-25T01-20-01.gz
rw-rr- 1 mongodb mongodb 761142 Mar 26 01:20 mongodb.log.2015-03-26T01-20-02.gz
rw-rr- 1 mongodb mongodb 2711949 Mar 27 01:20 mongodb.log.2015-03-27T01-20-03.gz
rw-rr- 1 mongodb mongodb 2118356 Mar 27 17:09 mongodb.log.2015-03-27T17-10-07.gz
rw-rr- 1 mongodb mongodb 230841 Mar 27 18:49 mongodb.log.2015-03-27T18-49-54.gz
rw-rr- 1 mongodb mongodb 656088 Mar 28 18:49 mongodb.log.2015-03-28T18-49-54.gz
rw-rr- 1 mongodb mongodb 367469 Mar 29 18:49 mongodb.log.2015-03-29T18-49-55.gz
rw-rr- 1 mongodb mongodb 238411 Mar 30 18:49 mongodb.log.2015-03-30T18-49-56.gz
rw-rr- 1 mongodb mongodb 1843707 Mar 31 18:49 mongodb.log.2015-03-31T18-49-57.gz
rw-rr- 1 mongodb mongodb 3730188 Apr 1 12:26 mongodb.log.2015-04-01T12-26-23.gz
rw-rr- 1 mongodb mongodb 3890 Apr 1 12:27 mongodb.log.2015-04-01T12-28-06.gz
rw-rr- 1 mongodb mongodb 101180 Apr 1 18:01 mongodb.log.2015-04-01T18-01-48.gz
rw-rr- 1 mongodb mongodb 1154532 Apr 2 18:01 mongodb.log.2015-04-02T18-01-50.gz
rw-rr- 1 mongodb mongodb 830064 Apr 3 18:01 mongodb.log.2015-04-03T18-01-52.gz
rw-rr- 1 mongodb mongodb 34302 Apr 3 19:25 mongodb.log.2015-04-03T19-26-00.gz
rw-rr- 1 mongodb mongodb 1224730 Apr 4 19:25 mongodb.log.2015-04-04T19-26-00.gz
rw-rr- 1 mongodb mongodb 1194894 Apr 5 19:25 mongodb.log.2015-04-05T19-26-02.gz
rw-rr- 1 mongodb mongodb 2662124 Apr 6 19:26 mongodb.log.2015-04-06T19-26-03.gz
rw-rr- 1 mongodb mongodb 1060796 Apr 7 15:42 mongodb.log.2015-04-07T15-43-06.gz
rw-rr- 1 mongodb mongodb 1531917 Apr 7 21:18 mongodb.log.2015-04-07T21-18-47.gz
rw-rr- 1 mongodb mongodb 17271 Apr 7 21:42 mongodb.log.2015-04-07T21-42-51.gz
rw-rr- 1 mongodb mongodb 9635 Apr 7 21:50 mongodb.log.2015-04-07T21-50-54.gz
rw-rr- 1 mongodb mongodb 8389 Apr 7 21:58 mongodb.log.2015-04-07T21-59-04.gz
rw-rr- 1 mongodb mongodb 4582 Apr 7 22:04 mongodb.log.2015-04-07T22-04-32.gz
rw-rr- 1 mongodb mongodb 6915534 Apr 8 20:37 mongodb.log.2015-04-08T20-37-55.gz
rw-rr- 1 mongodb mongodb 430918 Apr 9 20:37 mongodb.log.2015-04-09T20-37-56.gz
rw-rr- 1 mongodb mongodb 202355 Apr 10 20:37 mongodb.log.2015-04-10T20-37-56.gz
rw-rr- 1 mongodb mongodb 192175 Apr 11 20:37 mongodb.log.2015-04-11T20-37-57.gz
rw-rr- 1 mongodb mongodb 183213 Apr 12 20:37 mongodb.log.2015-04-12T20-37-57.gz
rw-rr- 1 mongodb mongodb 186008 Apr 13 20:37 mongodb.log.2015-04-13T20-37-58.gz
rw-rr- 1 mongodb mongodb 185945 Apr 14 20:37 mongodb.log.2015-04-14T20-37-58.gz
rw-rr- 1 mongodb mongodb 184829 Apr 15 20:37 mongodb.log.2015-04-15T20-37-58.gz
rw-rr- 1 mongodb mongodb 940357 Apr 16 20:37 mongodb.log.2015-04-16T20-37-58.gz
rw-rr- 1 mongodb mongodb 3196058 Apr 17 20:37 mongodb.log.2015-04-17T20-37-58.gz
rw-rr- 1 mongodb mongodb 4692931 Apr 18 20:37 mongodb.log.2015-04-18T20-37-59.gz
rw-rr- 1 mongodb mongodb 3725618 Apr 19 20:38 mongodb.log.2015-04-19T20-38-00.gz
rw-rr- 1 mongodb mongodb 3112432 Apr 20 20:37 mongodb.log.2015-04-20T20-38-01.gz
rw-rr- 1 mongodb mongodb 3216337 Apr 21 20:38 mongodb.log.2015-04-21T20-38-01.gz
rw-rr- 1 mongodb mongodb 3000773 Apr 22 20:38 mongodb.log.2015-04-22T20-38-01.gz
rw-rr- 1 mongodb mongodb 1985850 Apr 23 20:37 mongodb.log.2015-04-23T20-38-03.gz
rw-rr- 1 mongodb mongodb 221513 Apr 24 20:38 mongodb.log.2015-04-24T20-38-04.gz
rw-rr- 1 mongodb mongodb 234142 Apr 25 20:38 mongodb.log.2015-04-25T20-38-04.gz
rw-rr- 1 mongodb mongodb 221358 Apr 26 20:38 mongodb.log.2015-04-26T20-38-05.gz
rw-rr- 1 mongodb mongodb 221082 Apr 27 20:38 mongodb.log.2015-04-27T20-38-07.gz
rw-rr- 1 mongodb mongodb 221773 Apr 28 20:38 mongodb.log.2015-04-28T20-38-09.gz
rw-rr- 1 mongodb mongodb 221946 Apr 29 20:38 mongodb.log.2015-04-29T20-38-09.gz
rw-rr- 1 mongodb mongodb 222848 Apr 30 20:38 mongodb.log.2015-04-30T20-38-10.gz
rw-rr- 1 mongodb mongodb 221340 May 1 20:38 mongodb.log.2015-05-01T20-38-10.gz
rw-rr- 1 mongodb mongodb 221555 May 2 20:38 mongodb.log.2015-05-02T20-38-12.gz
rw-rr- 1 mongodb mongodb 221844 May 3 20:38 mongodb.log.2015-05-03T20-38-12.gz
rw-rr- 1 mongodb mongodb 221421 May 4 20:38 mongodb.log.2015-05-04T20-38-12.gz
rw-rr- 1 mongodb mongodb 222288 May 5 20:38 mongodb.log.2015-05-05T20-38-13.gz
rw-rr- 1 mongodb mongodb 222236 May 6 20:38 mongodb.log.2015-05-06T20-38-13.gz
rw-rr- 1 mongodb mongodb 224173 May 7 20:38 mongodb.log.2015-05-07T20-38-13.gz
rw-rr- 1 mongodb mongodb 221779 May 8 20:38 mongodb.log.2015-05-08T20-38-15.gz
rw-rr- 1 mongodb mongodb 220048 May 9 20:38 mongodb.log.2015-05-09T20-38-16.gz
rw-rr- 1 mongodb mongodb 220323 May 10 20:38 mongodb.log.2015-05-10T20-38-17.gz
rw-rr- 1 mongodb mongodb 221094 May 11 20:38 mongodb.log.2015-05-11T20-38-17.gz
rw-rr- 1 mongodb mongodb 221620 May 12 20:38 mongodb.log.2015-05-12T20-38-18.gz
rw-rr- 1 mongodb mongodb 222163 May 13 20:38 mongodb.log.2015-05-13T20-38-20.gz
rw-rr- 1 mongodb mongodb 224035 May 14 20:38 mongodb.log.2015-05-14T20-38-21.gz
rw-rr- 1 mongodb mongodb 222096 May 15 20:38 mongodb.log.2015-05-15T20-38-22.gz
rw-rr- 1 mongodb mongodb 221329 May 16 20:38 mongodb.log.2015-05-16T20-38-24.gz
rw-rr- 1 mongodb mongodb 221040 May 17 20:38 mongodb.log.2015-05-17T20-38-26.gz
rw-rr- 1 mongodb mongodb 221641 May 18 20:38 mongodb.log.2015-05-18T20-38-28.gz
rw-rr- 1 mongodb mongodb 221339 May 19 20:38 mongodb.log.2015-05-19T20-38-28.gz
rw-rr- 1 mongodb mongodb 223016 May 20 20:38 mongodb.log.2015-05-20T20-38-30.gz
rw-rr- 1 mongodb mongodb 223449 May 21 20:38 mongodb.log.2015-05-21T20-38-31.gz
rw-rr- 1 mongodb mongodb 222395 May 22 20:38 mongodb.log.2015-05-22T20-38-31.gz
rw-rr- 1 mongodb mongodb 222504 May 23 20:38 mongodb.log.2015-05-23T20-38-32.gz
rw-rr- 1 mongodb mongodb 221375 May 24 20:38 mongodb.log.2015-05-24T20-38-33.gz
rw-rr- 1 mongodb mongodb 3226728 May 25 20:38 mongodb.log.2015-05-25T20-38-33
rw-rr- 1 mongodb mongodb 3234830 May 26 20:38 mongodb.log.2015-05-26T20-38-33
rw-rr- 1 mongodb mongodb 3236799 May 27 20:38 mongodb.log.2015-05-27T20-38-34
rw-rr- 1 mongodb mongodb 3252322 May 28 20:38 mongodb.log.2015-05-28T20-38-36
-rwxr-xr-x 1 mongodb mongodb 6 Apr 8 20:38 mongod.lock
drwxr-xr-x 5 mongodb mongodb 4096 Mar 12 09:11 moveChunk
drwxr-xr-x 2 mongodb mongodb 12288 Apr 9 01:07 rollback
rw-rr- 1 mongodb mongodb 36864 May 29 18:13 sizeStorer.wt
rw-rr- 1 mongodb mongodb 95 Mar 8 20:32 storage.bson
rw-rr- 1 mongodb mongodb 49 Mar 8 20:32 WiredTiger
rw-rr- 1 mongodb mongodb 495 Mar 8 20:32 WiredTiger.basecfg
rw-rr- 1 mongodb mongodb 21 Mar 8 20:32 WiredTiger.lock
rw-rr- 1 mongodb mongodb 905 May 29 18:14 WiredTiger.turtle
rw-rr- 1 mongodb mongodb 139264 May 29 18:14 WiredTiger.wt

Comment by Alexander Gorrod [ 28/May/15 ]

Could you provide us with some more information please:

  • Which version of MongoDB are you using?
  • What operating system is the application running on?
  • Can you show us the content of the data directory (ls -l equivalent) for the shard that is exhibiting the negative chunk size?
Comment by Patrick White [ 27/May/15 ]

Thanks Ramon! Actually, that is the same shard, I'd just redacted the names. Also, this is all coming up because we are trying to drain that Shard, and having a horrible time at it. Any ideas how to force this to just drain? Also, we use MMS, so we aren't able to update the version until the drain is done.

Comment by Ramon Fernandez Marina [ 27/May/15 ]

Thanks for the information pat.white@synata.com. Looks like the problematic shard is not there anymore, but now sh_4 is experiencing the issue:

"sh_4" : {
			"ns" : "Files.fs.chunks",
			"count" : 3073,
			"size" : -114294,

This is the size used when computing the output of db.fs.chunks.getShardDistribution(), so looks like there's a bug in the storage layer causing a negative number to be returned. We'll keep you posted.

Comment by Patrick White [ 27/May/15 ]

Just attached the output of stats(), and we're using WiredTiger.

Comment by Patrick White [ 27/May/15 ]

Output of Files.fs.chunks.stats()

Comment by Ramon Fernandez Marina [ 27/May/15 ]

pat.white@synata.com, we need some more information to diagnose this:

  1. what storage engine are you using in your shards?
  2. in the Files database, what's the output of fs.chunks.stats()?

Thanks,
Ramón.

Comment by Patrick White [ 26/May/15 ]

BTW, happy to work with a mongo engineer if they'd like to see this behavior live.

Generated at Thu Feb 08 03:48:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.