Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4167

Sharding info not communicated to all mongos servers - results in Assertion Errors

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 2.0.0
    • 2.0.2
    • Sharding
    • None
    • ubuntu
    • Linux

    Description

      I have an existing sharded cluster with three nodes (mongo1-3) running successfully with one sharded collection (hourly_stats). The config server resides on mongo2. I then added a second sharded collection (hourly_customer_stats) and continued inserting/updating documents. I just happened to see the following errors in the log of one of my three mongos instances. The other two did not contain any errors.

      Fri Oct 28 00:28:00 [WriteBackListener-mongo1.foobar.com:27018] Assertion: 10181:not sharded:foobar.hourly_customer_stats
      0x5381b2 0x70e781 0x7e5b65 0x52820f 0x52a2c4 0x7fbe80 0x7f4294cd2d8c 0x7f429427d04d
      mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5381b2]
      mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsb+0x13d1) [0x70e781]
      mongos(_ZN5mongo17WriteBackListener3runEv+0x16b5) [0x7e5b65]
      mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x52820f]
      mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x52a2c4]
      mongos(thread_proxy+0x80) [0x7fbe80]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f4294cd2d8c]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f429427d04d]
      Fri Oct 28 00:28:00 [WriteBackListener-mongo1.foobar.com:27018] ~ScopedDbConnection: _conn != null
      Fri Oct 28 00:28:00 [WriteBackListener-mongo1.foobar.com:27018] WriteBackListener exception : not sharded:foobar.hourly_customer_stats
      Fri Oct 28 00:28:03 [WriteBackListener-mongo1.foobar.com:27018] Assertion: 10181:not sharded:foobar.hourly_customer_stats
      0x5381b2 0x70e781 0x7e5b65 0x52820f 0x52a2c4 0x7fbe80 0x7f4294cd2d8c 0x7f429427d04d
      mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5381b2]
      mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsb+0x13d1) [0x70e781]
      mongos(_ZN5mongo17WriteBackListener3runEv+0x16b5) [0x7e5b65]
      mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x52820f]
      mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x52a2c4]
      mongos(thread_proxy+0x80) [0x7fbe80]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f4294cd2d8c]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f429427d04d]
      Fri Oct 28 00:28:03 [WriteBackListener-mongo1.foobar.com:27018] ~ScopedDbConnection: _conn != null
      Fri Oct 28 00:28:03 [WriteBackListener-mongo1.foobar.com:27018] WriteBackListener exception : not sharded:foobar.hourly_customer_stats
      Fri Oct 28 00:28:07 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' acquired, ts : 4ea9f717d45e3a377394a0ae
      Fri Oct 28 00:28:07 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' unlocked.
      Fri Oct 28 00:28:07 [WriteBackListener-mongo1.foobar.com:27018] Assertion: 10181:not sharded:foobar.hourly_customer_stats
      0x5381b2 0x70e781 0x7e5b65 0x52820f 0x52a2c4 0x7fbe80 0x7f4294cd2d8c 0x7f429427d04d
      mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5381b2]
      mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsb+0x13d1) [0x70e781]
      mongos(_ZN5mongo17WriteBackListener3runEv+0x16b5) [0x7e5b65]
      mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf) [0x52820f]
      mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0x52a2c4]
      mongos(thread_proxy+0x80) [0x7fbe80]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x6d8c) [0x7f4294cd2d8c]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f429427d04d]
      Fri Oct 28 00:28:07 [WriteBackListener-mongo1.foobar.com:27018] ~ScopedDbConnection: _conn != null
      Fri Oct 28 00:28:07 [WriteBackListener-mongo1.foobar.com:27018] WriteBackListener exception : not sharded:foobar.hourly_customer_stats

      At this point I stopped all connections to all mongos servers and forced a config refresh on the mongos that was showing the Assertion error.

      mongos> db.adminCommand("flushRouterConfig")

      This resulted in the following output in that mongos log. The config server is on mongo2 and the IP address that I obfuscated is of mongo3:

      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] created new distributed lock for foobar.hourly_customer_stats on mongo2.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] ChunkManager: time to load chunks for foobar.hourly_customer_stats: 4ms sequenceNumber: 3 version: 15|11
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] created new distributed lock for foobar.hourly_stats on mongo2.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] ChunkManager: time to load chunks for foobar.hourly_stats: 18ms sequenceNumber: 4 version: 123|1
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] Socket say send() errno:110 Connection timed out xxx.xxx.xxx.xxx:27018
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] ERROR: error processing writeback: 9001 socket exception [2] server [xxx.xxx.xxx.xxx:27018]
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] trying reconnect to mongo3.foobar.com:27018
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] reconnect mongo3.foobar.com:27018 ok
      Fri Oct 28 00:35:01 [WriteBackListener-mongo1.foobar.com:27018] update will be retried b/c sharding config info is stale, left:4 ns: foobar.hourly_customer_stats query:

      { timestamp: 1317232800, sub_key: "sub-47e28f3a-ad4a-11e0-ae90-df553041b89f" }

      Fri Oct 28 00:35:08 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' acquired, ts : 4ea9f8bcd45e3a377394a0d8
      Fri Oct 28 00:35:08 [Balancer] distributed lock 'balancer/ip-10-170-41-123:27017:1319755095:1804289383' unlocked.

      I'm not sure what this means for the data that I attempted to update or insert into that collection while the errors were happening. Does it mean the updates failed? This seems like the kind of thing that should throw an error back to the driver so my application can handle it appropriately. I'm not sure what caused it in the first place, though.

      Attachments

        Issue Links

          Activity

            People

              greg_10gen Greg Studer
              zacwitte Zac Witte
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: