[SERVER-16514] Consistently getting "exception: collection's metadata is undergoing changes" during a drop Created: 11/Dec/14  Updated: 07/Dec/15  Resolved: 20/Jan/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.6, 2.8.0-rc2
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Alvin Richards (Inactive) Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File logs.tgz    
Issue Links:
Depends
depends on SERVER-16913 Return connections to pool on mongos ... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Problem

During a mongo-perf sharded run, can consistently get the message

Mixed.v3.FindOneUpdateIntId-50-50
1       6075.749179410887
2       13312.055801527022
4       24662.78574471106
8       38385.964154489324
12      40500.04579100665
16      38213.81966631593
20      36571.848693041466
Error running test [object Object]: drop failed: {
"code" : 13331,
"ok" : 0,
"errmsg" : "exception: collection's metadata is undergoing changes. Please try again."
}

Reproduce

config1

/home/ec2-user/mongodb-linux-x86_64-2.6.6/bin/mongod --configsvr --port 30001 --dbpath /data2/db/conf30001 --logpath /data3/logs/conf30001/server.log --fork --smallfiles

config2

/home/ec2-user/mongodb-linux-x86_64-2.6.6/bin/mongod --configsvr --port 30002 --dbpath /data2/db/conf30002 --logpath /data3/logs/conf30002/server.log --fork --smallfiles

config3

/home/ec2-user/mongodb-linux-x86_64-2.6.6/bin/mongod --configsvr --port 30003 --dbpath /data2/db/conf30003 --logpath /data3/logs/conf30003/server.log --fork --smallfiles

mongos

/home/ec2-user/mongodb-linux-x86_64-2.6.6/bin/mongos --port 27017 --configdb localhost:30001,localhost:30002,localhost:30003 --logpath /data3/logs/mongos/server.log --fork

shard1

/home/ec2-user/mongodb-linux-x86_64-2.6.6/bin/mongod --shardsvr --port 28001 --dbpath /data2/db/db100 --logpath /data3/logs/db100/server.log --fork

shard2

/home/ec2-user/mongodb-linux-x86_64-2.6.6/bin/mongod --shardsvr --port 28002 --dbpath /data2/db/db200 --logpath /data3/logs/db200/server.log --fork

benchRun

python benchrun.py -f testcases/mixed_small.js  -t 1 2 4 8 12 16 20 -l mytest -s ../mongo/mongo --writeCmd true --trialCount 1 '--testFilter='\''sanity'\''' --shard 2



 Comments   
Comment by Daniel Pasette (Inactive) [ 25/Dec/14 ]

any update on this issue?

Comment by Alvin Richards (Inactive) [ 11/Dec/14 ]

The mongos logs show

2014-12-11T16:14:47.200-0500 [conn5] CMD: shardcollection: { shardCollection: "test0.Mixed_v3_FindOneUpdateIntId-50-500", key: { _id: 1.0 } }
2014-12-11T16:14:47.200-0500 [conn5] enable sharding on: test0.Mixed_v3_FindOneUpdateIntId-50-500 with shard key: { _id: 1.0 }
2014-12-11T16:14:47.200-0500 [conn5] about to log metadata event: { _id: "ip-10-93-7-29.ec2.internal-2014-12-11T21:14:47-548a0947b4af01bae0102ab8", server: "ip-10-93-7-29.ec2.internal", clientAddr: "N/A", time: new Date(1418332487200), what: "shardCollection.start", ns: "test0.Mixed_v3_FindOneUpdateIntId-50-500", details: { shardKey: { _id: 1.0 }, collection: "test0.Mixed_v3_FindOneUpdateIntId-50-500", primary: "shard0000:localhost:28001", initShards: [], numChunks: 1 } }
2014-12-11T16:14:47.202-0500 [conn5] going to create 1 chunk(s) for: test0.Mixed_v3_FindOneUpdateIntId-50-500 using new epoch 548a0947b4af01bae0102ab9
2014-12-11T16:14:47.206-0500 [conn5] ChunkManager: time to load chunks for test0.Mixed_v3_FindOneUpdateIntId-50-500: 0ms sequenceNumber: 5 version: 1|0||548a0947b4af01bae0102ab9 based on: (empty)
2014-12-11T16:14:47.210-0500 [conn5] about to log metadata event: { _id: "ip-10-93-7-29.ec2.internal-2014-12-11T21:14:47-548a0947b4af01bae0102aba", server: "ip-10-93-7-29.ec2.internal", clientAddr: "N/A", time: new Date(1418332487210), what: "shardCollection", ns: "test0.Mixed_v3_FindOneUpdateIntId-50-500", details: { version: "1|0||548a0947b4af01bae0102ab9" } }
2014-12-11T16:14:47.214-0500 [mongosMain] connection accepted from 127.0.0.1:42052 #9 (2 connections now open)
2014-12-11T16:14:47.214-0500 [mongosMain] connection accepted from 127.0.0.1:42053 #10 (3 connections now open)
2014-12-11T16:14:47.214-0500 [mongosMain] connection accepted from 127.0.0.1:42054 #11 (4 connections now open)
2014-12-11T16:14:47.214-0500 [conn9] end connection 127.0.0.1:42052 (2 connections now open)
2014-12-11T16:14:47.214-0500 [conn10] resetting shard version of test0.Mixed_v3_FindOneUpdateIntId-50-500 on localhost:28002, version is zero
2014-12-11T16:14:47.386-0500 [conn11] warning: splitChunk failed - cmd: { splitChunk: "test0.Mixed_v3_FindOneUpdateIntId-50-500", keyPattern: { _id: 1.0 }, min: { _id: MinKey }, max: { _id: MaxKey }, from: "shard0000", splitKeys: [ { _id: 0.0 } ], shardId: "test0.Mixed_v3_FindOneUpdateIntId-50-500-_id_MinKey", configdb: "localhost:30001,localhost:30002,localhost:30003" } result: { who: { _id: "test0.Mixed_v3_FindOneUpdateIntId-50-500", state: 1, who: "ip-10-93-7-29.ec2.internal:28001:1418332479:172689237:conn3:981637605", ts: ObjectId('548a0947dbe096c75c2baec1'), process: "ip-10-93-7-29.ec2.internal:28001:1418332479:172689237", when: new Date(1418332487215), why: "split-{ _id: MinKey }" }, ok: 0.0, errmsg: "the collection's metadata lock is taken" }

The last message above is repeated with the occasional

2014-12-11T16:14:47.398-0500 [conn11] warning: chunk manager reload forced for collection 'test0.Mixed_v3_FindOneUpdateIntId-50-500', config version is 1|0||548a0947b4af01bae0102ab9

Until the following is seen

2014-12-11T16:14:47.901-0500 [conn10] autosplitted test0.Mixed_v3_FindOneUpdateIntId-50-500 shard: ns: test0.Mixed_v3_FindOneUpdateIntId-50-500, shard: shard0000:localhost:28001, lastmod: 1|0||000000000000000000000000, min: { _id: MinKey }, max: { _id: MaxKey } on: { _id: 0.0 } (splitThreshold 921)
2014-12-11T16:14:51.963-0500 [Balancer] distributed lock 'balancer/ip-10-93-7-29.ec2.internal:27017:1418332434:1804289383' acquired, ts : 548a094bb4af01bae0102abb
2014-12-11T16:14:51.966-0500 [Balancer]  ns: test0.Mixed_v3_FindOneUpdateIntId-50-500 going to move { _id: "test0.Mixed_v3_FindOneUpdateIntId-50-500-_id_MinKey", ns: "test0.Mixed_v3_FindOneUpdateIntId-50-500", min: { _id: MinKey }, max: { _id: 0.0 }, version: Timestamp 1000|1, versionEpoch: ObjectId('548a0947b4af01bae0102ab9'), lastmod: Timestamp 1000|1, lastmodEpoch: ObjectId('548a0947b4af01bae0102ab9'), shard: "shard0000" } from: shard0000 to: shard0001 tag []
2014-12-11T16:14:51.966-0500 [Balancer] moving chunk ns: test0.Mixed_v3_FindOneUpdateIntId-50-500 moving ( ns: test0.Mixed_v3_FindOneUpdateIntId-50-500, shard: shard0000:localhost:28001, lastmod: 1|1||000000000000000000000000, min: { _id: MinKey }, max: { _id: 0.0 }) shard0000:localhost:28001 -> shard0001:localhost:28002

Generated at Thu Feb 08 03:41:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.