[SERVER-8860] Sharded setup creates config database on a shard server Created: 05/Mar/13  Updated: 08/Mar/13  Resolved: 05/Mar/13

Status: Closed
Project: Core Server
Component/s: Sharding, Testing Infrastructure
Affects Version/s: 2.4.0-rc1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Ben Becker Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Run features3.js

Participants:

 Description   

While testing features3.js, I noticed write operations for the config database being sent to one of the shard servers. The following was collected from the server on port 30000, while the config server was running on port 30999:

		{
			"opid" : 5688,
			"active" : false,
			"op" : "update",
			"ns" : "",
			"query" : {
				"_id" : "leaf.local:30999"
			},
			"client" : "127.0.0.1:50395",
			"desc" : "conn15",
			"threadId" : "0x12c72f000",
			"connectionId" : 15,
			"locks" : {
				"^" : "w",
				"^config" : "W"
			},
			"waitingForLock" : true,
			"numYields" : 0,
			"lockStats" : {
				"timeLockedMicros" : {
					
				},
				"timeAcquiringMicros" : {
					
				}
			}
		}

Since features3.js runs a 10-second sleep operation in a $where query, this operation blocks other readers until completion. It looks like this shard server has all of the config collections you would expect to see on a config server:

> use config
switched to db config
> show collections
changelog
chunks
collections
databases
lockpings
locks
mongos
settings
shards
system.indexes
version

This may be unrelated, but I've also seen the following error a few times while running this test:

 m30000| Tue Mar  5 11:59:18.411 [initandlisten] connection accepted from 127.0.0.1:50907 #14 (14 connections now open)
 m30999| Tue Mar  5 11:59:34.565 [LockPinger] creating new connection to:localhost:30000
 m30999| Tue Mar  5 11:59:34.565 BackgroundJob starting: ConnectBG
 m30000| Tue Mar  5 11:59:34.565 [initandlisten] connection accepted from 127.0.0.1:50909 #15 (15 connections now open)
 m30999| Tue Mar  5 11:59:34.565 [LockPinger] connected connection!
 m30999| Tue Mar  5 11:59:40.581 [Balancer] Socket recv() timeout  127.0.0.1:30000
 m30999| Tue Mar  5 11:59:40.581 [Balancer] SocketException: remote: 127.0.0.1:30000 error: 9001 socket exception [3] server [127.0.0.1:30000] 
 m30999| Tue Mar  5 11:59:40.581 [Balancer] DBClientCursor::init call() failed
 m30999| Tue Mar  5 11:59:40.581 [Balancer] Assertion: 13632:couldn't get updated shard list from config server
 m30999| 0x1088d0c9b 0x1088ad71e 0x108842859 0x10879523b 0x1088aedb4 0x1088af4ea 0x1088af5b6 0x1088af676 0x108903425 0x7fff8745f742 0x7fff8744c181 
 m30999|  0   mongos                              0x00000001088d0c9b _ZN5mongo15printStackTraceERSo + 43
 m30999|  1   mongos                              0x00000001088ad71e _ZN5mongo11msgassertedEiPKc + 174
 m30999|  2   mongos                              0x0000000108842859 _ZN5mongo15StaticShardInfo6reloadEv + 403
 m30999|  3   mongos                              0x000000010879523b _ZN5mongo8Balancer3runEv + 401
 m30999|  4   mongos                              0x00000001088aedb4 _ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE + 652
 m30999|  5   mongos                              0x00000001088af4ea _ZNK5boost4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS3_9JobStatusEEEEclEPS3_S6_ + 68
 m30999|  6   mongos                              0x00000001088af5b6 _ZN5boost3_bi5list2INS0_5valueIPN5mongo13BackgroundJobEEENS2_INS_10shared_ptrINS4_9JobStatusEEEEEEclINS_4_mfi3mf1IvS4_S9_EENS0_5list0EEEvNS0_4typeIvEERT_RT0_i + 54
 m30999|  7   mongos                              0x00000001088af676 _ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv + 42
 m30999|  8   mongos                              0x0000000108903425 thread_proxy + 229
 m30999|  9   libsystem_c.dylib                   0x00007fff8745f742 _pthread_start + 327
 m30999|  10  libsystem_c.dylib                   0x00007fff8744c181 thread_start + 13
 m30999| Tue Mar  5 11:59:40.596 [Balancer] Detected bad connection created at 1362513544578032 microSec, clearing pool for localhost:30000
 m30999| Tue Mar  5 11:59:40.596 [Balancer] scoped connection to localhost:30000 not being returned to the pool
 m30999| Tue Mar  5 11:59:40.596 [Balancer] caught exception while doing balance: couldn't get updated shard list from config server
 m30999| Tue Mar  5 11:59:40.596 [Balancer] *** End of balancing round
^CTue Mar  5 11:59:45.437 got signal 2 (Interrupt: 2), will terminate after current cmd ends
Tue Mar  5 11:59:45.437 [interruptThread] now exiting
Tue Mar  5 11:59:45.437 dbexit: 
Tue Mar  5 11:59:45.437 [interruptThread] shutdown: going to close listening sockets...
Tue Mar  5 11:59:45.437 [interruptThread] closing listening socket: 9
Tue Mar  5 11:59:45.437 [interruptThread] closing listening socket: 10
Tue Mar  5 11:59:45.437 [interruptThread] closing listening socket: 11
Tue Mar  5 11:59:45.438 [interruptThread] removing socket file: /tmp/mongodb-27999.sock
Tue Mar  5 11:59:45.438 [interruptThread] shutdown: going to flush diaglog...



 Comments   
Comment by Scott Hernandez (Inactive) [ 05/Mar/13 ]

Currently the test is started this way:

s = new ShardingTest( "features3" , 2 , 1 , 1 );  // 2 shards, logLeve 1, 1 mongos server

If we want to have an isolated config db server it would be this way:

var s = new ShardingTest( {shards:2, verbose:1, separateConfig:true } );  // 2 shards, logLeve 1, [1 mongos server,] 1 config server

Comment by Scott Hernandez (Inactive) [ 05/Mar/13 ]

By default the config db is one of the shards on sharding tests unless you use the option to have a sep. config server.

Generated at Thu Feb 08 03:18:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.