[SERVER-9125] Unable to upgrade config metadata from v3 to v4 - 13127 getMore: cursor didn't exist on server, possible restart or timeout? Created: 25/Mar/13 Updated: 11/Jul/16 Resolved: 02/Apr/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.4.0, 2.4.1 |
| Fix Version/s: | 2.4.2, 2.5.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nimi Wariboko Jr. | Assignee: | Alberto Lerner |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Config Server 1: Config Server 2: Config Server 3: |
||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Steps To Reproduce: | 1.) Install mongos 2.4.1 on a machine |
||||
| Participants: | |||||
| Description |
|
When trying to perform an upgrade on the config servers, we get a timeout issue. [code] Our database consists of 18312, and of those, 16790 belong to a single collection. I have attempted to repeat the upgrade many times, and the issue continues to occur. |
| Comments |
| Comment by Alberto Lerner [ 13/Apr/13 ] |
|
Nimi, Thanks for the feedback. Might you be able to report also the size of your chunks/collection collections in config and how long the config migration process took? They should all be in the log of the mongos you use with the --upgrade Alberto. |
| Comment by Nimi Wariboko Jr. [ 12/Apr/13 ] |
|
Successfully upgraded to v4 with 2.4.2 rc0 |
| Comment by auto [ 09/Apr/13 ] |
|
Author: {u'date': u'2013-04-09T16:49:06Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 09/Apr/13 ] |
|
Author: {u'date': u'2013-04-09T16:28:03Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 09/Apr/13 ] |
|
Author: {u'date': u'2013-04-09T16:49:06Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 09/Apr/13 ] |
|
Author: {u'date': u'2013-04-09T16:28:03Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 08/Apr/13 ] |
|
Author: {u'date': u'2013-04-08T23:32:36Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 08/Apr/13 ] |
|
Author: {u'date': u'2013-04-08T23:32:36Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 02/Apr/13 ] |
|
Author: {u'date': u'2013-04-01T18:11:19Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by auto [ 01/Apr/13 ] |
|
Author: {u'date': u'2013-04-01T18:11:19Z', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}Message: |
| Comment by Alberto Lerner [ 27/Mar/13 ] |
|
TL;DR: fix is upcoming, expected within 2.4.2 time frame. ==== Here's a little bit more clarification. There's a special internal protocol to writing to config servers. Part of this protocol involves checks after writing every document. We want these checks. They're what allow a cluster to continue taking reads and writes if one of the config servers are down because the checks guarantee that the config servers always agree in content. The checks have a cost though. For all the operations against the configs we've done so far, that cost was not an issue. For the config upgrade procedure, though, we make backup copies (two, a working copy and a back up) of each collection that we're changing. (Recall that V2.4 config collections layout is slightly different than 2.2's. The upgrade process is what converts one lay out into the other.) The checks that we're doing in the config end up being too heavy for an entire collection copy – especially the chunks one. So the upgrade takes long as each collection gets copied a single document at a time. The time out here is because, in some cases, it may take the upgrade process (at the start of a 2.4 mongos with --upgrade) longer than what it takes for a cursor to time out to actually issue a getMore on that cursor. The upcoming fix will continue deploying the special checks that a config write incur – but we'd batch document when copying the collection so that the checks would be executed one per batch rather than one per document. |
| Comment by Alberto Lerner [ 26/Mar/13 ] |
|
We have identified the problem. The fix is coming shortly, and so is an explanation of what's causing what you are observing. |
| Comment by Nimi Wariboko Jr. [ 26/Mar/13 ] |
|
Sorry, I thought I had posted it. |
| Comment by Gianfranco Palumbo [ 26/Mar/13 ] |
|
Can you please upload the full log of the mongos? |