[SERVER-2237] Loss Data with Sharding Created: 16/Dec/10  Updated: 17/Mar/11  Resolved: 17/Dec/10

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.6.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Krishna Maddireddy Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux CentOs 5


Operating System: Linux
Participants:

 Description   

While testing sharding , i can see the data loss.

trace from shell
> db.contacts.count()
1700000
> for (var i = 1; i <= 500000; i++) db.contacts.save(

{aid:4, contact_id:i+200000,test_string: "This is test string to test mongodb sharding "+i,email:"test@yahoo.com"}

);
> db.contacts.count()
2199999
> db.contacts.count()
2199999
> db.contacts.count()
2203454
> db.contacts.count()
2199999
> db.contacts.count()
2199999

ERRORs from monos logs

Thu Dec 16 14:41:39 [conn1] autosplitting demo_contacts.contacts size: 1048648 shard: ns:demo_contacts.contacts at: shard0000:localhost:30001 lastmod: 101|31 min:

{ aid: 4.0, contact_id: 676362.0 }

max:

{ aid: 5.0, contact_id: 1283.0 }

on:

{ aid: 4.0, contact_id: 680372.0 }

(splitThreshold 1048576)
Thu Dec 16 14:41:39 [conn1] ERROR: splitIfShould failed: locking namespace failed
Thu Dec 16 14:41:39 [conn1] autosplitting demo_contacts.contacts size: 1048648 shard: ns:demo_contacts.contacts at: shard0000:localhost:30001 lastmod: 101|31 min:

{ aid: 4.0, contact_id: 676362.0 }

max:

{ aid: 5.0, contact_id: 1283.0 }

on:

{ aid: 4.0, contact_id: 681072.0 }

(splitThreshold 1048576)
Thu Dec 16 14:41:39 [conn1] ERROR: splitIfShould failed: locking namespace failed
Thu Dec 16 14:41:39 [conn1] autosplitting demo_contacts.contacts size: 1048648 shard: ns:demo_contacts.contacts at: shard0000:localhost:30001 lastmod: 101|31 min:

{ aid: 4.0, contact_id: 676362.0 }

max:

{ aid: 5.0, contact_id: 1283.0 }

on:

{ aid: 4.0, contact_id: 681771.0 }

(splitThreshold 1048576)
Thu Dec 16 14:41:39 [conn1] ERROR: splitIfShould failed: locking namespace failed
Thu Dec 16 14:41:39 [conn1] autosplitting demo_contacts.contacts size: 1048648 shard: ns:demo_contacts.contacts at: shard0000:localhost:30001 lastmod: 101|31 min:

{ aid: 4.0, contact_id: 676362.0 }

max:

{ aid: 5.0, contact_id: 1283.0 }

on:

{ aid: 4.0, contact_id: 682471.0 }

(splitThreshold 1048576)
Thu Dec 16 14:41:39 [conn1] ERROR: splitIfShould failed: locking namespace failed
Thu Dec 16 14:41:39 [conn1] autosplitting demo_contacts.contacts size: 1048648 shard: ns:demo_contacts.contacts at: shard0000:localhost:30001 lastmod: 101|31 min:

{ aid: 4.0, contact_id: 676362.0 }

max:

{ aid: 5.0, contact_id: 1283.0 }

on:

{ aid: 4.0, contact_id: 683170.0 }

(splitThreshold 1048576)
Thu Dec 16 14:41:39 [conn1] ERROR: splitIfShould failed: locking namespace failed
Thu Dec 16 14:41:

Mongd log from one of the shard

Thu Dec 16 14:43:05 [conn12] query admin.$cmd ntoreturn:1 command: { moveChunk: "demo_contacts.contacts", from: "localhost:30001", to: "localhost:30002", min:

{ aid: 4.0, contact_id: 249176.0 }

, max:

{ aid: 4.0, contact_id: 253273.0 }

, shardId: "demo_contacts.contacts-aid_4.0contact_id_249176.0", configdb: "localhost:20001" } reslen:53 1084msThu Dec 16 14:43:07 [conn9] Assertion: 13388:[demo_contacts.contacts] shard version not ok in Client::Context: your version is too old ns: demo_contacts.contacts global: 116|1 client: 106|1
0x540c7e 0x713d07 0x5fbe40 0x79aa58 0x797596 0x798538 0x5fb7e5 0x60029f 0x7074ba 0x70aaf6 0x82691b 0x83a4b0 0x3d28e0673d 0x3d286d3f6d ./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x1de) [0x540c7e]
./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo6Client7Context11_finishInitEb+0x1b7) [0x713d07] ./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo8runCountEPKcRKNS_7BSONObjERSs+0xc0) [0x5fbe40]
./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo8CmdCount3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0xa8) [0x79aa58] ./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xa16) [0x797596]
./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x798) [0x798538]
./mongodb-linux-x86_64-1.6.5/bin/mongod(ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x5fb7e5] ./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x1bbf) [0x60029f]
./mongodb-linux-x86_64-1.6.5/bin/mongod [0x7074ba]
./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x14d6) [0x70aaf6] ./mongodb-linux-x86_64-1.6.5/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x30b) [0x82691b]
./mongodb-linux-x86_64-1.6.5/bin/mongod(thread_proxy+0x80) [0x83a4b0]
/lib64/libpthread.so.0 [0x3d28e0673d]
/lib64/libc.so.6(clone+0x6d) [0x3d286d3f6d]
Thu Dec 16 14:43:10 [conn12] got movechunk: { moveChunk: "demo_contacts.contacts", from: "localhost:30001", to: "localhost:30002", min:

{ aid: 4.0, contact_id: 253273.0 }

, max:

{ aid: 4.0, contact_id: 256729.0 }

, shardId: "demo_contacts.contacts-aid_4.0contact_id_253273.0", configdb: "localhost:20001" }
Thu Dec 16 14:43:11 [conn12] _recvChunkStatus : { active: true, ns: "demo_contacts.contacts", from: "localhost:30001", min:

{ aid: 4.0, contact_id: 253273.0 }

, max:

{ aid: 4.0, contact_id: 256729.0 }

, s



 Comments   
Comment by Krishna Maddireddy [ 17/Dec/10 ]

After using Centos Mongodb packages , i don't see the issue any more.

Comment by Eliot Horowitz (Inactive) [ 16/Dec/10 ]

Are you sure its not an off by 1 error?
Can you try 1.7.3?

Comment by Krishna Maddireddy [ 16/Dec/10 ]

Yes it should be 2200000.

In this case there is a loss of one record, but i have see major data lost with other tests.

yes here are the indexes

> db.contacts.getIndexes()
[
{
"name" : "id",
"ns" : "demo_contacts.contacts",
"key" :

{ "_id" : 1 }

},
{
"_id" : ObjectId("4d0a6618cea3b6594c8967eb"),
"ns" : "demo_contacts.contacts",
"key" :

{ "aid" : 1, "contact_id" : 1 }

,
"name" : "aid_1_contact_id_1",
"unique" : true
}
]
>

Comment by Eliot Horowitz (Inactive) [ 16/Dec/10 ]

Are you sure expected number is 2200000?
Are there any unique indexes?

An occasionally higher count is expected sometimes right now.

Generated at Thu Feb 08 02:59:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.