[SERVER-12251] After MovePrimary on unsharded database mongodb cluster works incorrectly Created: 04/Jan/14  Updated: 10/Dec/14  Resolved: 10/Feb/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Yauhen Artsiukhou Assignee: David Hows
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File all.logs.tar.gz    
Operating System: ALL
Participants:

 Description   

I complete the following steps:

  1. Start sharded cluster with 3 config, sh000 and sh001 shards, 3 routers.
  2. Start application (creates 4 databases and put some initial small data)
  3. Stop application
  4. Move one database to another shard (sh001 -> sh000) using MovePrimary (name: cloud_perf_default)

In version 2.4.5 I see that db has been moved only on mongos, where command has been issued. Other routers says that database hasn't been moved. Restarting Mongos helps.
Now in 2.4.8 this look ok and all routers says that db has been moved.

After some time i decided to drop all app databases. And see that my database (cloud_perf_default) can't be deleted, but mongos returns ok: 1.

Now, here is my outputs:

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("52bd869c18410ba74327fcb3")
}
  shards:
        {  "_id" : "sh000",  "host" : "sh000/E235A:27018" }
        {  "_id" : "sh001",  "host" : "sh001/E242E:27018" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "cluster_test_r14_sys_jms_store",  "partitioned" : false,  "primary" : "sh000" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "sh000" }
        {  "_id" : "db",  "partitioned" : false,  "primary" : "sh000" }
 
mongos> show dbs;
admin   (empty)
cloud_perf_default      5.951171875GB
config  0.046875GB
test    (empty)

database cluster_test_r14_sys_jms has been delete at the begin of this story (why it still there?)
Shard 000: pay attention that there is no cluster_test_r14_sys_jms

sh000:PRIMARY> show dbs;
local   10.0732421875GB
test    (empty)

Shard 001: actually database exists but on wrong shard. And with data?
How can it be? Write on old shard, trying to read from new?

sh001:PRIMARY> show dbs;
cloud_perf_default      5.951171875GB
local   10.0732421875GB

I can't find similar issues so decided to create new one. Probably this already has been fixed in new version.



 Comments   
Comment by David Hows [ 10/Feb/14 ]

Hi Yauhen,

I'm marking this issue as closed as we have not heard back from you.

If there is further we can do for you on this issue please feel free to re-open it with details.

Regards,
David

Comment by David Hows [ 30/Jan/14 ]

Hi Yauhen,

How readily are you able to reproduce this?

I agree with James that this potentially relates to SERVER-8059/SERVER-8870, which would occur when you have multiple MongoS.

Would you be able to test running flushRouterConfig on each of your MongoS after the drop of a database and see if that stops the issue from re-occuring as you have seen?

Thanks,
David

Comment by Yauhen Artsiukhou [ 30/Jan/14 ]

Thank for your reply, James!

So, after removal database I should use flushRouterConfig before trying to create new one again? Should I do flushRouterConfig on all routers in the cluster?

Comment by James Blackburn [ 29/Jan/14 ]

We see this quite frequently: SERVER-8870, CS-8109 and to me there appear to be two ways to get the mongos into inconsistent states.

  1. After movePrimary many of the mongos will not have noticed a move
  2. When a database is dropped and recreated, some mongoses will reflect the correct primary shard for the database, and others will show the previous

flushRouterConfig appears to fix the split brain for us. So far I've been unable to reproduce it reliably outside our cluster.

Comment by Yauhen Artsiukhou [ 29/Jan/14 ]

Added all.logs.tar.gz files.
I have filtered some information from logs

Comment by Yauhen Artsiukhou [ 29/Jan/14 ]

Sorry for long post.

Finally I catch this thing on our dev environment.

I spend many time to exclude from logs not related information. Trying to save all logs.
Now, looks like it is not a 'movePrimary()' issue.

The setup on dev: 3 shard servers (without replicas) 3 routers (application on same host uses localhost for mongodb connection)

I found the time when database had been dropped and after 3 minutes had been recreated.
Note: nobody uses direct connect to mongod in our setup (app uses mongos on localhost)

Logs from First Router

# No related logs

Logs from Second Router

Sat Jan 11 15:27:30.031 [conn4021] DROP DATABASE: org_dev2_sys_quartz
Sat Jan 11 15:27:30.031 [conn4021] erased database org_dev2_sys_quartz from local registry
Sat Jan 11 15:27:30.033 [conn4021] DBConfig::dropDatabase: org_dev2_sys_quartz
Sat Jan 11 15:27:30.033 [conn4021] about to log metadata event: { _id: "SRV249A-2014-01-11T12:27:30-52d138b2262f902e0d2511b7", server: "SRV249A", clientAddr: "N/A", time: n
ew Date(1389443250033), what: "dropDatabase.start", ns: "org_dev2_sys_quartz", details: {} }
Sat Jan 11 15:27:30.315 [conn4021]    DBConfig::dropDatabase: org_dev2_sys_quartz dropped sharded collections: 0
Sat Jan 11 15:27:30.340 [conn4021] about to log metadata event: { _id: "SRV249A-2014-01-11T12:27:30-52d138b2262f902e0d2511b8", server: "SRV249A", clientAddr: "N/A", time: n
ew Date(1389443250340), what: "dropDatabase", ns: "org_dev2_sys_quartz", details: {} }
Sat Jan 11 15:27:36.735 [conn4021] DROP DATABASE: org_dev2_sys_quartz
Sat Jan 11 15:27:36.735 [conn4021] erased database org_dev2_sys_quartz from local registry
Sat Jan 11 15:27:36.737 [conn4021] DBConfig::dropDatabase: org_dev2_sys_quartz
Sat Jan 11 15:27:36.738 [conn4021] about to log metadata event: { _id: "SRV249A-2014-01-11T12:27:36-52d138b8262f902e0d2511be", server: "SRV249A", clientAddr: "N/A", time: new Date(1389443256737), what: "dropDatabase.start", ns: "org_dev2_sys_quartz", details: {} }
Sat Jan 11 15:27:36.873 [conn4021]    DBConfig::dropDatabase: org_dev2_sys_quartz dropped sharded collections: 0
Sat Jan 11 15:27:36.908 [conn4021] about to log metadata event: { _id: "SRV249A-2014-01-11T12:27:36-52d138b8262f902e0d2511bf", server: "SRV249A", clientAddr: "N/A", time: new Date(1389443256908), what: "dropDatabase", ns: "org_dev2_sys_quartz", details: {} }
Sat Jan 11 15:27:39.134 [Balancer] distributed lock 'balancer/SRV249A:27017:1388738934:1804289383' acquired, ts : 52d138ba262f902e0d2511c0
Sat Jan 11 15:27:39.203 [Balancer] distributed lock 'balancer/SRV249A:27017:1388738934:1804289383' unlocked. 

Logs from Third Router

# No related logs

Log From Shard 001

Sat Jan 11 15:27:30.403 [conn1351] dropDatabase org_dev2_sys_quartz starting
Sat Jan 11 15:27:30.409 [conn1351] removeJournalFiles
Sat Jan 11 15:27:30.426 [conn1351] dropDatabase org_dev2_sys_quartz finished
...
# Somebody request to create this db? Not me :)
Sat Jan 11 15:30:02.414 [FileAllocator] allocating new datafile /var/lib/mongodb-sh-001/org_dev2_sys_security.ns, filling with zeroes...
Sat Jan 11 15:30:02.849 [FileAllocator] done allocating datafile /var/lib/mongodb-sh-001/org_dev2_sys_security.ns, size: 16MB,  took 0.434 secs
Sat Jan 11 15:30:02.850 [FileAllocator] allocating new datafile /var/lib/mongodb-sh-001/org_dev2_sys_security.0, filling with zeroes...
Sat Jan 11 15:30:02.854 [FileAllocator] done allocating datafile /var/lib/mongodb-sh-001/org_dev2_sys_security.0, size: 64MB,  took 0.003 secs
Sat Jan 11 15:30:02.854 [FileAllocator] allocating new datafile /var/lib/mongodb-sh-001/org_dev2_sys_security.1, filling with zeroes...
Sat Jan 11 15:30:02.860 [conn776] build index org_dev2_sys_security.User { _id: 1 }
Sat Jan 11 15:30:02.863 [conn776] build index done.  scanned 0 total records. 0.002 secs
Sat Jan 11 15:30:02.863 [conn776] info: creating collection org_dev2_sys_security.User on add index
Sat Jan 11 15:30:02.863 [conn776] build index org_dev2_sys_security.User { _e3sId: 1 }
Sat Jan 11 15:30:02.864 [FileAllocator] done allocating datafile /var/lib/mongodb-sh-001/org_dev2_sys_security.1, size: 128MB,  took 0.002 secs
Sat Jan 11 15:30:02.865 [conn776] build index done.  scanned 0 total records. 0.002 secs
Sat Jan 11 15:30:02.866 [conn776] insert org_dev2_sys_security.system.indexes ninserted:1 keyUpdates:0 locks(micros) w:451717 452ms
Sat Jan 11 15:30:02.877 [conn794] build index org_dev2_sys_security.User { securityId: 1 }
Sat Jan 11 15:30:02.878 [conn794] build index done.  scanned 0 total records. 0.001 secs
Sat Jan 11 15:30:02.895 [conn777] build index org_dev2_sys_security.User { group: 1 }
Sat Jan 11 15:30:02.897 [conn777] build index done.  scanned 0 total records. 0.002 secs
Sat Jan 11 15:30:02.904 [conn777] build index org_dev2_sys_security.User { dynamic.group: 1 }
Sat Jan 11 15:30:02.905 [conn777] build index done.  scanned 0 total records. 0.001 secs

Logs from Shard 002

Sat Jan 11 15:27:37.565 [conn5496] dropDatabase org_dev2_sys_quartz starting
Sat Jan 11 15:27:37.570 [conn5496] removeJournalFiles
Sat Jan 11 15:27:37.597 [conn5496] dropDatabase org_dev2_sys_quartz finished

At "Sat Jan 11" I have only one record and shard is 002:

Putting database quartz on router 249a

Sat Jan 11 15:27:36.735 [conn4021]       put [org_dev2_sys_quartz] on: sh002:sh002/SRV24A2:27018
Tue Jan 14 12:36:22.087 [conn4790]       put [org_dev2_sys_quartz] on: sh002:sh002/SRV24A2:27018
Tue Jan 14 12:40:14.066 [conn4790]       put [org_dev2_sys_quartz] on: sh002:sh002/SRV24A2:27018

Putting database quartz on router 248f

Fri Jan 17 17:33:24.491 [conn1253]       put [org_dev2_sys_quartz] on: sh002:sh002/SRV24A2:27018
Fri Jan 10 16:10:26.491 [conn8]          put [org_dev2_sys_quartz] on: sh001:sh001/SRV24A1:27018
Fri Jan  3 12:41:33.642 [conn59]         put [org_dev2_sys_quartz] on: sh002:sh002/SRV24A2:27018

According logs, it has been created on `shard001` on the 10 of January. Then dropped and re-created at `shard002` on the 11 of January.
But the result: database has been marked as created on shard 002, but actually has been created on `shard 001`.

Comment by Yauhen Artsiukhou [ 21/Jan/14 ]

This our staging env and logs contains some sensitive data. But I'll try to grep and send it to you.
Now I'm working on reproducing this bug in isolated environment (Vagrant).
Also I'll double check my installation steps with official documentation (I want be sure that installation is correct).

Comment by Eliot Horowitz (Inactive) [ 21/Jan/14 ]

Can you send the mongos logs?

Comment by Yauhen Artsiukhou [ 20/Jan/14 ]

Updates: please note that databases aren't shared.
Also now for me it is BLOCKER! Here is scenario:

  1. In described environment start application
  2. Application created few databases
  3. Stop application
  4. Drop databases - first issue: unable to drop database, but they are empty
  5. Start application
  6. Crash application

My investigation:

  1. sh.status() display that database are located on shard 000, but actually database are located on shard 001.
  2. sometimes show dbs displays different results from sh.status().
  3. trying read from different routers gives different results: one goes on shard 000 and returns empty set, another one has the same output of sh.status() but uses actually shard 001.
  4. Right location is shard 000, wrong location is shard 001. But shard 001 contains data.
Generated at Thu Feb 08 03:28:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.