[SERVER-2132] data loss when use a small chunksize Created: 22/Nov/10  Updated: 12/Jul/16  Resolved: 11/Apr/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tao Liu Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS: Red Hat Enterprise Linux AS release 4 . Kernal: 2.6.9_5 x86_64
mongodb version: mongodb-linux-x86_64-static-legacy-1.6.3


Attachments: Zip Archive mongologs.zip    
Operating System: Linux
Participants:

 Description   

When using mongoimport tool to insert sample data (100000000 rows totally),
if chunkSize is set to 50MB, there is about 20000 rows data lost .
But when using default chunkSize (200MB) , there is No data loss.

The shard environment contains 2 Shards , 2 Shards + 2 Replicas, 4 Shards + 4 Replicas.
Once chunkSize is set to 50MB, the data must be lost.

When chunkSize is set to 50MB, the mongos log contains too many autosplit failure.such as
ERROR: splitIfShould failed: locking namespace failed
or
ERROR: saving chunks failed.

I think the autosplit process failure is normal when use small chunkSize.
But the data loss is weird.

the attachment contains all logs when using 2 shards.



 Comments   
Comment by Tao Liu [ 23/Nov/10 ]

Thanks a lot. Admire.

Comment by Eliot Horowitz (Inactive) [ 23/Nov/10 ]

Resolving - please comment if you have any other issues with this.

For reference, 1.6.5 should be out next monday after a week of burn in.

Comment by Eliot Horowitz (Inactive) [ 23/Nov/10 ]

Yes - exactly.

Comment by Tao Liu [ 23/Nov/10 ]

I have tryed the test with 1.6.5-RC1, the problem can not be reproduced.
Does the bug of SERVER-2068 lead to this problem?

Comment by Eliot Horowitz (Inactive) [ 23/Nov/10 ]

Please try with 1.6.5 first.
There is a bug we fixed there that is likely the culprit.

Comment by Tao Liu [ 23/Nov/10 ]

The attachment files have contained all the logs in the test process.
I have tryed the same test with 1.6.4RC0 and the problem can be reproduced.

I will try the test with 1.7.3 and 1.6.5, but I prefer to choose a stable release in our production enviorment.
I want to confirm the data loss can be avoided when the chunkSize is set to 200MB.

Comment by Eliot Horowitz (Inactive) [ 22/Nov/10 ]

Can you send the full logs?
Also - would you mind trying the same test with 1.7.3 or 1.6.5

Generated at Thu Feb 08 02:59:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.