[SERVER-30277] multi shard have the same shard key Created: 24/Jul/17  Updated: 27/Oct/23  Resolved: 14/Sep/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: JavenZhang Assignee: Mark Agarunov
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive person-failed.log.zip     JPEG File same id in diffient shard.jpeg    
Operating System: ALL
Participants:

 Description   

I have a mongodb server with four shards.
first, I create a collection 'PERSON', then I specify a shard key '_id' by following code:

 MongoDatabase adminDB = client.getDatabase("admin");
  try {
      adminDB.runCommand(new Document("enablesharding", "myDB"));
  }catch(Exception e) {
      System.out.println(e.getMessage());
  }
  MongoDatabase db = client.getDatabase("myDB");
  db.createCollection("PERSON");
  adminDB.runCommand(new Document().append("shardCollection","myDB.PERSON").append("key",new Document().append("_id","hashed")));

finally , I saved some data to the collection. but when I try to query person info with some conditions, I got two records with total same values include ids. I logged into each shard and checked my record. I found the record with same id appeared in both shard0 and shard3.

btw:
My data is saved to mongodb through spark conector( artifactId: mongo-spark-connector_2.11, version: 2.1.0) using data set api.



 Comments   
Comment by Mark Agarunov [ 14/Sep/17 ]

Hello javenzhang,

Thank you for the response. I am happy to hear you are no longer seeing this issue. As we cannot continue diagnosing this without the logs, and you're no longer seeing this behavior, I've closed this ticket as "Gone Away". If you see this problem in the future or more information comes to light, we can reopen this ticket and continue investigating.

Thanks,
Mark

Comment by JavenZhang [ 14/Sep/17 ]

I'm very sorry for feedback your message so late .

Our MongoDB has been reinstalled and all logs has been removed .So, I have to tell you that I can't provide you any useful logs.

After reinstall, I slow down my spark's speed, and I have not see any incorrect data until now.

Comment by Ramon Fernandez Marina [ 13/Sep/17 ]

javenzhang, we haven't heard back from you for some time. If this is still an issue for you, can you please provide the information requested by Mark above so we can investigate?

Thanks,
Ramón.

Comment by Mark Agarunov [ 30/Aug/17 ]

Hello javenzhang,

Thanks for the additional information. There may be an orphan document as the result of a failed migration or unclean shutdown.

We're working to reproduce this issue. To help us continue to investigate, would you please answer the following questions?

I've created a secure portal for you to send us this data privately.

Thanks,
Mark

Comment by JavenZhang [ 28/Jul/17 ]

Hello Mark Agarunov:

Error log for 'PERSON' collection is here:
person-failed.log.zip

I checked the error, and found that most of the errors are in mongos, and just a little error logs in shard node .

Must of the error is like this:

2017-07-24T13:12:16.508+0800 I SHARDING [conn3222] Unable to auto-split chunk [{ _id: -6917529027641081850 }, { _id: -4611686018427387900 }) :: caused by :: 46 split failed due to LockBusy: timed out waiting for myDB.PERSON

error log in shard0 is :

2017-07-24T18:45:06.123+0800 W SHARDING [conn1469] Chunk move failed :: caused by :: ChunkTooBig: Cannot move chunk: the maximum number of documents for a chunk is 105747, the maximum chunk size is 67108864, average document size is 825. Found 609600 documents in chunk  ns: myDB.PERSON { _id: -7019385057768877093 } -> { _id: -6917529027641081850 }

Comment by Mark Agarunov [ 27/Jul/17 ]

Hello javenzhang,

Thank you for providing this information. I see in the output that there are quite a few errors: Failed with error 'aborted', from shard1 to shard2. To see the cause of these errors and further investigate, could you please provide the complete logs from mongos and all mongod nodes?

Thanks,
Mark

Comment by JavenZhang [ 27/Jul/17 ]

Hi Mark Agarunov.

I dropped the old collection, and I'm trying to save my data, If this happened again, I will tell you .

The following is what I got when I execute the command 'sh.status()'

--- Sharding Status --- 
  sharding version: {
	"_id" : 1,
	"minCompatibleVersion" : 5,
	"currentVersion" : 6,
	"clusterId" : ObjectId("59704ec5baf9e453d06ab0ba")
}
  shards:
	{  "_id" : "shard0",  "host" : "shard0/mongo_backup:27021,mongo_backup:27025",  "state" : 1 }
	{  "_id" : "shard1",  "host" : "shard1/mongo_backup:27022,mongo_backup:27026",  "state" : 1 }
	{  "_id" : "shard2",  "host" : "shard2/mongo_backup:27023,mongo_backup:27027",  "state" : 1 }
	{  "_id" : "shard3",  "host" : "shard3/mongo_backup:27024,mongo_backup:27028",  "state" : 1 }
  active mongoses:
	"3.4.3" : 1
 autosplit:
	Currently enabled: yes
  balancer:
	Currently enabled:  yes
	Currently running:  no
		Balancer lock taken at Thu Jul 20 2017 14:33:43 GMT+0800 by ConfigServer:Balancer
	Failed balancer rounds in last 5 attempts:  0
	Migration Results for the last 24 hours: 
		144 : Success
		1 : Failed with error 'aborted', from shard1 to shard2
		1 : Failed with error 'aborted', from shard3 to shard1
		3 : Failed with error 'aborted', from shard2 to shard3
		3 : Failed with error 'aborted', from shard1 to shard3
		1 : Failed with error 'aborted', from shard0 to shard1
		1 : Failed with error 'aborted', from shard1 to shard0
		2 : Failed with error 'aborted', from shard0 to shard3
		2 : Failed with error 'aborted', from shard0 to shard2
		2 : Failed with error 'aborted', from shard2 to shard1
		1 : Failed with error 'aborted', from shard3 to shard2
  databases:
	{  "_id" : "myDB",  "primary" : "shard2",  "partitioned" : true }
		myDB.PERSON
			shard key: { "_id" : "hashed" }
			unique: false
			balancing: true
			chunks:
				shard0	656
				shard1	656
				shard2	656
				shard3	653
			too many chunks to print, use verbose if you want to force print

Comment by Mark Agarunov [ 26/Jul/17 ]

Hello javenzhang,

Thank you for the report. Looking over this, this may be caused by orphaned documents. To get a better understanding of what may be happening, could you please try to replicate this behavior with mongos directly? Additionally, please provide the output of sh.status() on the mongos.

Thanks,
Mark

Generated at Thu Feb 08 04:23:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.