[SERVER-35009] Sharded cluster with small chunk size set makes bulk insert jobs fail to return Created: 16/May/18  Updated: 10/Jun/18  Resolved: 18/May/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Royce Brown Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-31837 Recipient shard should not wait for `... Backlog
Operating System: ALL
Steps To Reproduce:

Create & init config server as a single relica set

mongod --configsvr --replSet CSRS --bind_ip 127.0.0.1 --port 59130 --dbpath ./data/cfg0 -logpath ./logs/cfg0.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/cfg0.pid

Init replica set

mongo --host 127.0.0.1 --port 59130 --eval 'rs.initiate({_id:"CSRS",configsvr: true, members: [{_id: 0,host: "127.0.0.1:59130"}]});'

Start mongos

mongos --bind_ip 127.0.0.1 --port 27017 --configdb CSRS/127.0.0.1:59130 --fork --logpath ./logs/mongos.log --pidfilepath ./run/mongos.pid

Start two shard servers

mongod --shardsvr --bind_ip 127.0.0.1 --port 59131 --dbpath ./data/d0  --logpath ./logs/do0.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/d0.pid
mongod --shardsvr --bind_ip 127.0.0.1 --port 59132 --dbpath ./data/d1  --logpath ./logs/do1.log --smallfiles --oplogSize 128 --fork --pidfilepath ./run/d1.pid

Set chunksize

mongo --host 127.0.0.1 --port 27017 --eval 'cfg = db.getSiblingDB("config"); cfg.settings.save( { _id:"chunksize", value: 1 } );'

Init sharding

mongo --host 127.0.0.1 --port 27017 --eval 'sh.addShard("127.0.0.1:59131"); sh.addShard("127.0.0.1:59132");'

We now have a running mongo sharded cluster

--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5afb4fe9d00caef6cbc972d1")
  }
  shards:
        {  "_id" : "shard0000",  "host" : "127.0.0.1:59131",  "state" : 1 }
        {  "_id" : "shard0001",  "host" : "127.0.0.1:59132",  "state" : 1 }
  active mongoses:
        "3.6.0" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours:
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard0000       1
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0)

Now if we run this script it will never return or maybe take a very long time, have waited 1 hour before killing it.

In file load_shard.js

db = db.getSiblingDB('mydb');
sh.enableSharding("mydb");
db.user.ensureIndex({"user_id":1});
sh.shardCollection("mydb.user",{"user_id":1});
 
var bulk = db.user.initializeUnorderedBulkOp();
people = ["Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy", "Greg", "Steve", "Kristina", "Katie", "Jeff"];
for(var i=0; i<200000; i++){
   user_id = i;
   name = people[Math.floor(Math.random()*people.length)];
   number = Math.floor(Math.random()*10001);
   bulk.insert( { "user_id":user_id, "name":name, "number":number });
}
bulk.execute();

Run script

mongo --host 127.0.0.1 --port 27017 < load_shard.js

Script never returns. If you make another connection to mongos and do a sh.status() it looks like data has been written.

Participants:

 Description   

There is a problem running Mongo 3.6.* in a test sharded clustered, where the chunk size is set small as in 1 or 2M.
Doing a bulk insert on a shard enabled database using a script piped into mongo shell makes the job stuck and never returns.
Increasing the chunk size or reducing the amount of data fixes it.
This use to work under Mongo 2.6.7

The problem can be created with the simplest of set up one config & mongos server and two shard servers. It problem still exits though,
even if you have a much larger cluster with multiple replica sets with multiple shard servers.

have tried doing this on version 3.6.0. & 3.6.4 but same results



 Comments   
Comment by Ramon Fernandez Marina [ 18/May/18 ]

Thanks for the feedback royce55, and glad to hear the workaround is working for you. I'm going to mark this ticket as a duplicate of SERVER-31837 – please feel free to watch that ticket for updates.

Regards,
Ramón.

Comment by Mary Gorman [ 18/May/18 ]

LOST message in JIRA

  • To avoid future delays, please ask commenter to respond from: royce@ecs.vuw.ac.nz or log into JIRA with Username: royce55 and Group PWD

To: "Ramon Fernandez (JIRA)" <jira@mongodb.org>
Cc: 
Bcc: 
Date: Fri, 18 May 2018 12:41:32 +1200
Subject: Re: [MongoDB-JIRA] (SERVER-35009) Sharded cluster with small chunk size set makes bulk insert jobs fail to return

I added the --setParameter orphanCleanupDelaySecs=0 to the shard server startup command line.
Set the chunk size to 1 and ran the test. It all worked returning as you said in about 10 seconds.

Thank you for that answer, I searched the internet but there didn't seem to be any record of any one having this problem.
If it's not documented anywhere maybe it should be.

Anyway thanks very much, it fixes the problem.

Regards
Royce

Comment by Ramon Fernandez Marina [ 17/May/18 ]

royce55, a colleague points out that this looks like SERVER-31837, and adding --setParameter orphanCleanupDelaySecs=0 to all your shards should help. I tried it and your load script completes in about 10s – can you please try it on your end and report back?

Regards,
Ramón.

Generated at Thu Feb 08 04:38:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.