[SERVER-4838] buildbot wrong number of inserts b/c of weird writeback thing Created: 01/Feb/12  Updated: 11/Jul/16  Resolved: 17/Feb/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.1.0
Fix Version/s: 2.1.1

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-4762 bigMapReduce.js failing following bal... Closed
Operating System: ALL
Participants:

 Description   

http://buildbot.mongodb.org/builders/Linux%2032-bit%20debug/builds/1244/steps/test_3/logs/stdio/text

issue on writeback?

 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] writebacklisten result: { data: { writeBack: true, ns: "test.foo", id: ObjectId('4f28d1b3c0b2cd27f6c3eb1e'), connectionId: 5, instanceIdent: "tp2.10gen.cc:30000", version: Timestamp 1000|0, yourVersion: Timestamp 0|0, msg: BinData }, ok: 1.0 }
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] Assertion: 10181:not sharded:test.foo
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] dev: lastError==0 won't report:not sharded:test.foo
 m30998| 0x83b7378 0x85185a7 0x85184b2 0x8544277 0x8543fa9 0x83d141f 0x8513e13 0x85176b4 0x85175cc 0x8517552 0x85174d4 0x1f3be6 0xd5d919 0x2dad4e 
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x83b7378]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo11msgassertedEiPKc+0xf5) [0x85185a7]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo11msgassertedEiPKc+0) [0x85184b2]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsbb+0x1e3) [0x8544277]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo8DBConfig23getChunkManagerIfExistsERKSsbb+0x4b) [0x8543fa9]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo17WriteBackListener3runEv+0xa5b) [0x83d141f]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0x34b) [0x8513e13]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZNK5boost4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS3_9JobStatusEEEEclEPS3_S6_+0x68) [0x85176b4]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5boost3_bi5list2INS0_5valueIPN5mongo13BackgroundJobEEENS2_INS_10shared_ptrINS4_9JobStatusEEEEEEclINS_4_mfi3mf1IvS4_S9_EENS0_5list0EEEvNS0_4typeIvEERT_RT0_i+0x72) [0x85175cc]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5boost3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS5_9JobStatusEEEEENS0_5list2INS0_5valueIPS5_EENSB_IS8_EEEEEclEv+0x48) [0x8517552]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x22) [0x85174d4]
 m30998|  /usr/lib/libboost_thread-mt.so.1.41.0(thread_proxy+0x66) [0x1f3be6]
 m30998|  /lib/libpthread.so.0() [0xd5d919]
 m30998|  /lib/libc.so.6(clone+0x5e) [0x2dad4e]



 Comments   
Comment by auto [ 16/Feb/12 ]

Author:

{u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-4838 use single op for primary or chunk manager in queries
Branch: master
https://github.com/mongodb/mongo/commit/aa853edd99a59dd317751dbc4fc79365690a150e

Comment by auto [ 16/Feb/12 ]

Author:

{u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}

Message: SERVER-4838 determine all at once whether a collection is sharded or not
Branch: master
https://github.com/mongodb/mongo/commit/e5992464af21028b836413eb562cc4869da091ca

Comment by Greg Studer [ 06/Feb/12 ]

see patch @ http://codereview.10gen.com/7337120/, as discussed earlier

Comment by Eliot Horowitz (Inactive) [ 02/Feb/12 ]

This might be fixed by:

https://github.com/mongodb/mongo/commit/3de6d8e2431109cd562b425d312c7f727024b9d1

Comment by Greg Studer [ 01/Feb/12 ]

Think the issue with the dropped write (potentially in other cases too) is a race condition -

 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] writebacklisten result: { data: { writeBack: true, ns: "test.foo", id: ObjectId('4f28d1b3c0b2cd27f6c3eb1e'), connectionId: 5, instanceIdent: "tp2.10gen.cc:30000", version: Timestamp 1000|0, yourVersion: Timestamp 0|0, msg: BinData }, ok: 1.0 }
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] Assertion: 10181:not sharded:test.foo
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] dev: lastError==0 won't report:not sharded:test.foo
 m30998| 0x83b7378 0x85185a7 0x85184b2 0x8544277 0x8543fa9 0x83d141f 0x8513e13 0x85176b4 0x85175cc 0x8517552 0x85174d4 0x1f3be6 0xd5d919 0x2dad4e 
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo15printStackTraceERSo+0x26) [0x83b7378]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo11msgassertedEiPKc+0xf5) [0x85185a7]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo11msgassertedEiPKc+0) [0x85184b2]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsbb+0x1e3) [0x8544277]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo8DBConfig23getChunkManagerIfExistsERKSsbb+0x4b) [0x8543fa9]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo17WriteBackListener3runEv+0xa5b) [0x83d141f]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0x34b) [0x8513e13]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZNK5boost4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS3_9JobStatusEEEEclEPS3_S6_+0x68) [0x85176b4]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5boost3_bi5list2INS0_5valueIPN5mongo13BackgroundJobEEENS2_INS_10shared_ptrINS4_9JobStatusEEEEEEclINS_4_mfi3mf1IvS4_S9_EENS0_5list0EEEvNS0_4typeIvEERT_RT0_i+0x72) [0x85175cc]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5boost3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS5_9JobStatusEEEEENS0_5list2INS0_5valueIPS5_EENSB_IS8_EEEEEclEv+0x48) [0x8517552]
 m30998|  /home/yellow/buildslave/Linux_32bit_debug/mongo/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x22) [0x85174d4]
 m30998|  /usr/lib/libboost_thread-mt.so.1.41.0(thread_proxy+0x66) [0x1f3be6]
 m30998|  /lib/libpthread.so.0() [0xd5d919]
 m30998|  /lib/libc.so.6(clone+0x5e) [0x2dad4e]
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] warning: chunk manager not found for test.foo :: caused by :: 10181 not sharded:test.foo
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] connectionId: tp2.10gen.cc:30000:5 writebackId: 4f28d1b3c0b2cd27f6c3eb1e needVersion : 1|0 mine : (unknown)
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] op: insert len: 62 ns: test.foo{ _id: ObjectId('4f28d1b305d622069ed4e42d'), i: 1.0 }
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] DBConfig unserialize: test { _id: "test", partitioned: true, primary: "shard0000" }
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] created new distributed lock for test.foo on localhost:29000 ( lock timeout : 900000, ping interval : 30000, process : 0 )
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] ChunkManager: time to load chunks for test.foo: 0ms sequenceNumber: 2 version: 1|0
 m30998| Tue Jan 31 23:46:27 [conn1] User Assertion: 10194:can't call primaryShard on a sharded collection!
 m30998| Tue Jan 31 23:46:27 [WriteBackListener-localhost:30000] creating new connection to:localhost:30000
 m30998| Tue Jan 31 23:46:27 [conn1] AssertionException while processing op type : 2002 to : test.foo :: caused by :: 10194 can't call primaryShard on a sharded collection!
 m30998| Tue Jan 31 23:46:27 BackgroundJob starting: ConnectBG
 m30998| Tue Jan 31 23:46:27 [conn1]  have to set shard version for conn: 0xb2401780 ns:test.foo my last seq: 0  current: 2 version: 1|0 manager: 0xb2501a50

The collection's sharding metadata is refreshed by the wbl, but while doing so another thread [conn1] is inserting another document. Initially the collection is detected as unsharded and the request starts an unsharded insert to the primary shard of the database, but when checking the metadata again the collection is now sharded and there is no primary. In theory could also happen without the wbl.

Generated at Thu Feb 08 03:07:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.