[SERVER-4167] Sharding info not communicated to all mongos servers - results in Assertion Errors Created: 28/Oct/11  Updated: 11/Jul/16  Resolved: 02/Jan/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.0
Fix Version/s: 2.0.2

Type: Bug Priority: Major - P3
Reporter: Zac Witte Assignee: Greg Studer
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

ubuntu


Issue Links:
Depends
depends on SERVER-4171 WritebackListener should force detect... Closed
Related
Operating System: Linux
Participants:

 Description   

I have an existing sharded cluster with three nodes (mongo1-3) running successfully with one sharded collection (hourly_stats). The config server resides on mongo2. I then added a second sharded collection (hourly_customer_stats) and continued inserting/updating documents. I just happened to see the following errors in the log of one of my three mongos instances. The other two did not contain any errors.

Fri Oct 28 00:28:00 [WriteBackListener-mongo1.foobar.com:27018] Assertion: 10181:not sharded:foobar.hourly_customer_stats
0x5381b2 0x70e781 0x7e5b65 0x52820f 0x52a2c4 0x7fbe80 0x7f4294cd2d8c 0x7f429427d04d
mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5381b2]
mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsb+0x13d1) [0x70e781]
mongos(_ZN5mongo17WriteBackListener3runEv+0x16b5) [0x7e5b65]
mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x52820f]
mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x52a2c4]
mongos(thread_proxy+0x80) [0x7fbe80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f4294cd2d8c]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f429427d04d]
Fri Oct 28 00:28:00 [WriteBackListener-mongo1.foobar.com:27018] ~ScopedDbConnection: _conn != null
Fri Oct 28 00:28:00 [WriteBackListener-mongo1.foobar.com:27018] WriteBackListener exception : not sharded:foobar.hourly_customer_stats
Fri Oct 28 00:28:03 [WriteBackListener-mongo1.foobar.com:27018] Assertion: 10181:not sharded:foobar.hourly_customer_stats
0x5381b2 0x70e781 0x7e5b65 0x52820f 0x52a2c4 0x7fbe80 0x7f4294cd2d8c 0x7f429427d04d
mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5381b2]
mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsb+0x13d1) [0x70e781]
mongos(_ZN5mongo17WriteBackListener3runEv+0x16b5) [0x7e5b65]
mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x52820f]
mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x52a2c4]
mongos(thread_proxy+0x80) [0x7fbe80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f4294cd2d8c]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f429427d04d]
Fri Oct 28 00:28:03 [WriteBackListener-mongo1.foobar.com:27018] ~ScopedDbConnection: _conn != null
Fri Oct 28 00:28:03 [WriteBackListener-mongo1.foobar.com:27018] WriteBackListener exception : not sharded:foobar.hourly_customer_stats
Fri Oct 28 00:28:07 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' acquired, ts : 4ea9f717d45e3a377394a0ae
Fri Oct 28 00:28:07 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' unlocked.
Fri Oct 28 00:28:07 [WriteBackListener-mongo1.foobar.com:27018] Assertion: 10181:not sharded:foobar.hourly_customer_stats
0x5381b2 0x70e781 0x7e5b65 0x52820f 0x52a2c4 0x7fbe80 0x7f4294cd2d8c 0x7f429427d04d
mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5381b2]
mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsb+0x13d1) [0x70e781]
mongos(_ZN5mongo17WriteBackListener3runEv+0x16b5) [0x7e5b65]
mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x52820f]
mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x52a2c4]
mongos(thread_proxy+0x80) [0x7fbe80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f4294cd2d8c]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f429427d04d]
Fri Oct 28 00:28:07 [WriteBackListener-mongo1.foobar.com:27018] ~ScopedDbConnection: _conn != null
Fri Oct 28 00:28:07 [WriteBackListener-mongo1.foobar.com:27018] WriteBackListener exception : not sharded:foobar.hourly_customer_stats

At this point I stopped all connections to all mongos servers and forced a config refresh on the mongos that was showing the Assertion error.

mongos> db.adminCommand("flushRouterConfig")

This resulted in the following output in that mongos log. The config server is on mongo2 and the IP address that I obfuscated is of mongo3:

Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] created new distributed lock for foobar.hourly_customer_stats on mongo2.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] ChunkManager: time to load chunks for foobar.hourly_customer_stats: 4ms sequenceNumber: 3 version: 15|11
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] created new distributed lock for foobar.hourly_stats on mongo2.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] ChunkManager: time to load chunks for foobar.hourly_stats: 18ms sequenceNumber: 4 version: 123|1
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] Socket say send() errno:110 Connection timed out xxx.xxx.xxx.xxx:27018
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] ERROR: error processing writeback: 9001 socket exception [2] server [xxx.xxx.xxx.xxx:27018]
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] trying reconnect to mongo3.foobar.com:27018
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] reconnect mongo3.foobar.com:27018 ok
Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] update will be retried b/c sharding config info is stale, left:4 ns: foobar.hourly_customer_stats query:

{ timestamp: 1317232800, sub_key: "sub-47e28f3a-ad4a-11e0-ae90-df553041b89f" }

Fri Oct 28 00:35:08 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' acquired, ts : 4ea9f8bcd45e3a377394a0d8
Fri Oct 28 00:35:08 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' unlocked.

I'm not sure what this means for the data that I attempted to update or insert into that collection while the errors were happening. Does it mean the updates failed? This seems like the kind of thing that should throw an error back to the driver so my application can handle it appropriately. I'm not sure what caused it in the first place, though.



 Comments   
Comment by Greg Studer [ 02/Jan/12 ]

Resolved with SERVER-4171

Comment by Greg Studer [ 02/Nov/11 ]

Have you tried / seen this again on 2.0.1? We suspect you shouldn't, but if so, can you post the logs where it occurs?
EDIT : actually, this may be the issue linked above.

Comment by Eliot Horowitz (Inactive) [ 28/Oct/11 ]

Without safe mode, there isn't any way for mongos (or mongod) to send anything to the driver.
Its all one way.

Comment by Zac Witte [ 28/Oct/11 ]

Was not using safe mode for updates. But it still seems like the kind of thing where mongos should be able to throw an error back to the driver immediately without waiting for successful write the database.

Comment by Eliot Horowitz (Inactive) [ 28/Oct/11 ]

I think this was fixed in 2.0.1

Re: getting errors in driver, were you using safe mode for updates?

Generated at Thu Feb 08 03:05:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.