[SERVER-11159] Secondary stuck during index building Created: 12/Oct/13  Updated: 10/Dec/14  Resolved: 14/Feb/14

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Dharshan Rangegowda Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log    
Operating System: Linux
Participants:

 Description   

2.4.1 replica set
Customer build a index on the primary as background
The secondary built the index and then got the following error
DR102 too much data written uncommitted 314.9MB

After this secondary was at 100% cpu and not reachable by other members.
I deleted the secondary data and resynced but same issue.
Then I took the server offline from replica set and deleted the added index. After that the server was fine. The logs are attached from the secondary



 Comments   
Comment by Ranjay Krishna [ 05/Nov/13 ]

Hi Dharshan,

bgsync refers to the rsBackgroundSync.

It appears that the secondary is under heavy load because it is building an index in the foreground, which is a blocking command. When the bgsync is full, it will stall the secondary, causing it to not be responsive to the secondary.

Comment by Dharshan Rangegowda [ 04/Nov/13 ]

Hi Ranjay,

Thats what I ended up doing. When you say "bgsync" buffer do you mean the oplog? Also if it was not able to sync shouldn't the server have gone into recovering state? I noticed that the process got stuck with 100% cpu usage and ended up being not responsive to the other servers. The user shouldn't be able to do this to the server.

Comment by Ranjay Krishna [ 04/Nov/13 ]

When you build an index on a primary with background=true, it also builds the index on secondaries in the foreground. Since it takes a long time to build the index when running on background, there will be a lot of data that the secondary will have to resync. Since the bgsync buffer is capped at 256 MB, and the data you need resynced is 314.9MB, it will cause the OS to fill up its memory and swap space.

Note that foreground index build is a blocking operation on the secondary.
Also, DR102 is a warning not an error. It is issued in dur_commitjob.cpp. I believe for journal sync it can go as high as 512MB at which point mongo will abort.

I would recommend taking down the secondaries one by one and building the index and adding them back to the replica set. and then finally doing the same to the primary. This is described here

Generated at Thu Feb 08 03:25:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.