[SERVER-5164] When syncing a secondary to a primary (replica sets), the mapping to memory may freeze until you do a query on the database. Created: 01/Mar/12 Updated: 15/Aug/12 Resolved: 22/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Shane Reustle | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | replicaset | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 10.04, 64GB ram, AWS, 2800GB database. |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
We often snapshot the primary database and spin up a new secondary to start replicating from it. Since the primary is so large (3TB +-) it takes about 45 minutes to start up. We do this every once in awhile and have noticed some interesting bugs that continue to happen. 1) When the new secondary is turned on for the first time, watching mongostat, you will see MAPPED and VSIZE climbing (should eventually get to 2800 gb) but it would stop at 196 gb. To get it to continue, you need to run mongo on that machine and run "show dbs" to get the database to think again. From then on it would be fine. 2) A different problem that happens, which doesn't happen the same time as the problem above, is that mongostat shows mongo mapping up to 2800 gb properly, but then never switching from rs status "UNK" to rs status "SEC". I had to manually go into the database and run show collections in a database to, again, get it to think. Once I did that it switched over to SEC and was fine. Any input is appreciated, thanks! |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 02/Mar/12 ] |
|
It looks like it just took that long to find where to sync from in the oplog. Running listDatabases is going to slow startup down: the secondary needs to load pieces of the local database and so it's competing with listDatabases for disk IO. Can you also attach the log from the primary from 17:00 to 18:30? |
| Comment by Shane Reustle [ 02/Mar/12 ] |
|
I've attached the secondary log from when it first started to when it was finished mapping and started answering heartbeats. |
| Comment by Eliot Horowitz (Inactive) [ 02/Mar/12 ] |
|
1) is normal - it only counts mapped for dbs that have been opened |