Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.6.8, 4.0.9
Component/s: None
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Setup (cluster and data):

mlaunch init --dir clusters/shardedcounttest --shards 2 --replicaset --nodes 3 --mongos 2 mgeneratejs '{"country" : "US","payload":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur auctor, risus vitae molestie varius, tellus massa sagittis diam, non convallis augue magna quis tortor. Quisque dictum lacus quis faucibus fermentum. Fusce lobortis eros pretium, rhoncus ipsum at, mattis libero. Duis ut enim nunc. Fusce orci metus, egestas et arcu a, sollicitudin luctus enim. Maecenas enim sapien, condimentum vel ornare eget, feugiat vel velit. Vestibulum quis orci lectus. Nullam rhoncus massa ut ligula mattis volutpat. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nam cursus vitae elit eget tempor. Etiam vulputate luctus arcu, vel lobortis nulla laoreet id. Maecenas ac mi metus. Nullam luctus consectetur pharetra. Vivamus orci enim, congue eget interdum non, semper sed lacus."}' -n 10000 | mongoimport -d mydb -c countries

Setup (sharding and config):

Note that setting a low orphanCleanupDelaySecs is very helpful in ensuring that the default of 15 minutes doesn't delay back to back migrations.

mongo --port 27017 sh.enableSharding("mydb") use mydb db.countries.createIndex({"country":1}) sh.shardCollection("mydb.countries",{"country":1}) while (true) { print(new ISODate() + ' - ' + db.countries.count({country:"US"}));sleep(1000) } mongo --port 27019 db.adminCommand( { setParameter: 1, orphanCleanupDelaySecs: 2 } ) mongo --port 27022 db.adminCommand( { setParameter: 1, orphanCleanupDelaySecs: 2 } )

Testing:

Note the python script ( SERVER41258.py ) connects to one router but moveChunk commands should be run on the other.

python SERVER41258.py

+

mongo --port 27018 sh.moveChunk("mydb.countries",{country:"US"},"shard02") sh.moveChunk("mydb.countries",{country:"US"},"shard01")

(repeat as necessary)

Results:

Python script shows count drops alongside range deleter activity and does not recover...

2019-06-04 15:11:32.560431 10000 2019-06-04 15:11:33.566062 10000 2019-06-04 15:11:34.574102 8719 2019-06-04 15:11:35.579170 3728 2019-06-04 15:11:36.584497 0 2019-06-04 15:11:37.586690 0

2019-06-04T08:11:36.401-0700 I SHARDING [Collection Range Deleter] No documents remain to delete in mydb.countries range [{ country: MinKey }, { country: MaxKey }) 2019-06-04T08:11:36.402-0700 I SHARDING [Collection Range Deleter] Waiting for majority replication of local deletions in mydb.countries range [{ country: MinKey }, { country: MaxKey }) 2019-06-04T08:11:36.418-0700 I SHARDING [Collection Range Deleter] Finished deleting documents in mydb.countries range [{ country: MinKey }, { country: MaxKey })

...unless flushRouterConfig is run on the mongos that the python script is connected to (port 27017 in this test)

2019-06-04 15:14:17.252945 0 2019-06-04 15:14:18.254547 0 2019-06-04 15:14:19.264117 10000 2019-06-04 15:14:20.273053 10000

Amending the python script to use readPreference primary prevents the issue. The issue also does not occur if both the python script and the moveChunk are run against the same router.

original reproduction

1. Create a 2 replica-set sharded cluster
2. Create a collection foo with two fields [id,country], sharded on _id. We inserted 200,000 records which made 4 chunks.

db.foo.find({"country" : "US"}).count() should return 200,000 documents.

3. Run this Python script to print out the number of results against a secondary query

import pymongo import time import datetime from pymongo.errors import ConnectionFailure, AutoReconnect from bson.objectid import ObjectId constring = 'mongodb://localhost:27017/admin' print(constring) client = pymongo.MongoClient(constring, readPreference='secondaryPreferred') db1=client['foo'] while True: cursor = db1.foo.find({"country" : "UK"}).count() print(cursor) time.sleep(60)

Here's the timeline:

200000 -> script started 200000 200000 200000 -> balancer enabled 200000 -> results may look consistent but that's only because rangeDeleter hasn't started yet. A consistent script would return 300,000 documents as the secondaries don't filter orphans. The count is still accessing only shard1 200000 200000 199616 -> rangeDeleter started removing migrated chunks. 175725 -> The script still accessing only shard1 and as result count decreases 152233 128447 105075 100000 -> rangeDeleter finishes removing migrated chunks 100000 -> script returns inconsistent results as it only points to shard1 100000 100000 100000 200000 -> A find executed against primaries which reloaded the version and results consistent again 200000 -> The count now accesses both shards (shard1 & shard2) so it returns consistent results again. 200000 200000
Show
Setup (cluster and data): mlaunch init --dir clusters/shardedcounttest --shards 2 --replicaset --nodes 3 --mongos 2 mgeneratejs '{"country" : "US","payload":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur auctor, risus vitae molestie varius, tellus massa sagittis diam, non convallis augue magna quis tortor. Quisque dictum lacus quis faucibus fermentum. Fusce lobortis eros pretium, rhoncus ipsum at, mattis libero. Duis ut enim nunc. Fusce orci metus, egestas et arcu a, sollicitudin luctus enim. Maecenas enim sapien, condimentum vel ornare eget, feugiat vel velit. Vestibulum quis orci lectus. Nullam rhoncus massa ut ligula mattis volutpat. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Nam cursus vitae elit eget tempor. Etiam vulputate luctus arcu, vel lobortis nulla laoreet id. Maecenas ac mi metus. Nullam luctus consectetur pharetra. Vivamus orci enim, congue eget interdum non, semper sed lacus."}' -n 10000 | mongoimport -d mydb -c countries Setup (sharding and config): Note that setting a low orphanCleanupDelaySecs is very helpful in ensuring that the default of 15 minutes doesn't delay back to back migrations. mongo --port 27017 sh.enableSharding("mydb") use mydb db.countries.createIndex({"country":1}) sh.shardCollection("mydb.countries",{"country":1}) while (true) { print(new ISODate() + ' - ' + db.countries.count({country:"US"}));sleep(1000) } mongo --port 27019 db.adminCommand( { setParameter: 1, orphanCleanupDelaySecs: 2 } ) mongo --port 27022 db.adminCommand( { setParameter: 1, orphanCleanupDelaySecs: 2 } ) Testing: Note the python script ( SERVER41258.py ) connects to one router but moveChunk commands should be run on the other. python SERVER41258.py + mongo --port 27018 sh.moveChunk("mydb.countries",{country:"US"},"shard02") sh.moveChunk("mydb.countries",{country:"US"},"shard01") (repeat as necessary) Results: Python script shows count drops alongside range deleter activity and does not recover... 2019-06-04 15:11:32.560431 10000 2019-06-04 15:11:33.566062 10000 2019-06-04 15:11:34.574102 8719 2019-06-04 15:11:35.579170 3728 2019-06-04 15:11:36.584497 0 2019-06-04 15:11:37.586690 0 2019-06-04T08:11:36.401-0700 I SHARDING [Collection Range Deleter] No documents remain to delete in mydb.countries range [{ country: MinKey }, { country: MaxKey }) 2019-06-04T08:11:36.402-0700 I SHARDING [Collection Range Deleter] Waiting for majority replication of local deletions in mydb.countries range [{ country: MinKey }, { country: MaxKey }) 2019-06-04T08:11:36.418-0700 I SHARDING [Collection Range Deleter] Finished deleting documents in mydb.countries range [{ country: MinKey }, { country: MaxKey }) ...unless flushRouterConfig is run on the mongos that the python script is connected to (port 27017 in this test) 2019-06-04 15:14:17.252945 0 2019-06-04 15:14:18.254547 0 2019-06-04 15:14:19.264117 10000 2019-06-04 15:14:20.273053 10000 Amending the python script to use readPreference primary prevents the issue. The issue also does not occur if both the python script and the moveChunk are run against the same router. original reproduction 1. Create a 2 replica-set sharded cluster 2. Create a collection foo with two fields [id,country] , sharded on _id. We inserted 200,000 records which made 4 chunks. db.foo.find({"country" : "US"}).count() should return 200,000 documents. 3. Run this Python script to print out the number of results against a secondary query import pymongo import time import datetime from pymongo.errors import ConnectionFailure, AutoReconnect from bson.objectid import ObjectId constring = 'mongodb://localhost:27017/admin' print (constring) client = pymongo.MongoClient(constring, readPreference= 'secondaryPreferred' ) db1=client[ 'foo' ] while True : cursor = db1.foo.find({ "country" : "UK" }).count() print (cursor) time.sleep(60) Here's the timeline: 200000 -> script started 200000 200000 200000 -> balancer enabled 200000 -> results may look consistent but that 's only because rangeDeleter hasn' t started yet. A consistent script would return 300,000 documents as the secondaries don't filter orphans. The count is still accessing only shard1 200000 200000 199616 -> rangeDeleter started removing migrated chunks. 175725 -> The script still accessing only shard1 and as result count decreases 152233 128447 105075 100000 -> rangeDeleter finishes removing migrated chunks 100000 -> script returns inconsistent results as it only points to shard1 100000 100000 100000 200000 -> A find executed against primaries which reloaded the version and results consistent again 200000 -> The count now accesses both shards (shard1 & shard2) so it returns consistent results again. 200000 200000
Sprint:
Sharding 2019-06-17, Sharding 2019-07-01
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

original description

We had a production incident last week where we had inconsistent results reading from secondaries for a collection, and a secondary wasn't returning a document due to a chunk move and the a stale chunk map. Running flushRouterConfig fixed the issue.

We have a collection that has very little read and write activity, with the vast majority of all reads occur on secondaries. This collection was sharded on the field

{k:1}

with the vast majority of find queries being a findOne with

{k: "someValue"}

What appears to be the case is that when a chunk migration happens, the mongos reloads the collection version only when an action is performed against the primary (read or write). If there is no action against the primary, the mongos will still use the old chunk map. This is problematic when adding shards and rebalancing, which is what we had done.

On a read-only collection or on a collection that rarely receives writes, like ours it is possible the secondary reads to miss documents after a chunk migration to a shard that didn't previously exist on the chunk map.

It appears that if the collection receives a read or write to the primary, the mongos that issued the command will update its chunk map but other mongos will not.

We're running 3.6.8 on MMAP.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

SERVER41258.py
0.4 kB
Jun 04 2019 04:50:46 PM UTC
test.js
1 kB
Jun 12 2019 08:35:56 PM UTC

Assignee:: Randolph Tan
Reporter:: Jon Hyman
Participants:: Eric Sedor, Jon Hyman, Randolph Tan, Scott Glajch
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: May 21 2019 06:46:21 PM UTC
Updated:: Oct 27 2023 01:53:13 PM UTC
Resolved:: Jun 28 2019 03:20:22 PM UTC
Confidence Status Last Update:: 12/Jun/19 8:45 PM

Details

original reproduction

Description

original description

Attachments

Attachments

Activity

People

Dates