[SERVER-32398] Primary freezes during background index build Created: 19/Dec/17  Updated: 05/Feb/18  Resolved: 21/Dec/17

Status: Closed
Project: Core Server
Component/s: Index Maintenance, WiredTiger
Affects Version/s: 3.4.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Smith Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: RF
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 16.04 Server


Attachments: Text File gdb-mongod-2.txt     Text File unique_threads.txt    
Issue Links:
Duplicate
duplicates SERVER-21307 Replicated DDL (catalog) operation du... Closed
Related
is related to SERVER-32652 mongod killed by OOM while one second... Closed
Operating System: ALL
Participants:

 Description   

mongod 3.4.10 on Ubuntu 16.04 in a replica set with 3 nodes. The primary was building an index and the index build stopped:

Dec 18 22:05:57 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4027500/10026608 40%
Dec 18 22:06:00 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4062300/10026608 40%
Dec 18 22:06:03 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4096100/10026608 40%
Dec 18 22:06:06 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4127300/10026608 41%
Dec 18 22:06:09 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4152100/10026608 41%
Dec 18 22:06:12 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4189100/10026608 41%
Dec 18 22:06:15 hostname mongod.27017[1250]: [conn5562]   Index Build (background): 4241600/10026608 42%

At this point it stalled, and no more "Index Build" lines were printed until I restarted mongod several hours later. From here on I could connect to the admin database as the root database user, but I couldn't connect as any other user - the connection would just hang. Logs below in case they're relevant (I suspect they're not).

Dec 18 22:06:38 hostname monit[1400]: 'mongo_index_size' '/usr/local/bin/mongo_index_size.sh' failed with exit status (1) -- Mongo indexes are consuming 14900 MB, higher than threshold of 12034 MB
Dec 18 22:06:38 hostname mongod.27017[1250]: [thread1] connection accepted from 127.0.0.1:45244 #5605 (66 connections now open)
Dec 18 22:06:38 hostname mongod.27017[1250]: [thread1] connection accepted from 127.0.0.1:45246 #5606 (67 connections now open)
Dec 18 22:06:38 hostname mongod.27017[1250]: [conn5606] received client metadata from 127.0.0.1:45246 conn5606: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.4.10" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }
Dec 18 22:06:38 hostname mongod.27017[1250]: [conn5605] received client metadata from 127.0.0.1:45244 conn5605: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.4.10" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "16.04" } }

Backtrace attached.



 Comments   
Comment by Kelsey Schubert [ 21/Dec/17 ]

Hi Mike,

Due to the issue described in SERVER-21307 you may encounter the following behavior:

  • an index build runs on the primary
  • after it completes on the primary it is replicated to the secondaries and begins building there
  • while the index build is occurring on the secondaries replication halts due to SERVER-21307 and so replication lag begins building, and this prevents updates occurring on the primary from being committed to a majority of the replica set
  • because majority read concern is enabled, this requires us to keep an old copy the data on the primary to satisfy majority read concern reads should they occur
  • this places additional pressure on the cache on the primary, until it reaches a limit and operations stall entirely

In summary, you should avoid operations that create replica lag while running with majority read concern enabled. For index builds in particular, you can avoid this issue by using rolling index builds where you build indexes on secondaries one at a time and before building them on the primary.

To address your questions specifically, old pages are kept in cache on the primary until it has been written to a majority of members in the replica set. Consequently, in your case, both secondaries would need to be lagging for the issue to manifest. If both secondaries are lagging, I would expect 'tracked dirty bytes in the cache" to increase and for the "freeze" to occur it hits 20% of the maximum configured cache size.

Kind regards,
Kelsey

Comment by Michael Smith [ 21/Dec/17 ]

Hi Kelsey,

What metric did you use to identify the cache pressure? Would it be something like db.serverStatus().wiredTiger.cache['tracked dirty bytes in the cache']?

What is blocking pages from being evicted from the cache? Is it waiting until one secondary provides feedback that it's applied the log (i.e. majority write) or is it waiting for ALL secondaries?

I had this again tonight while doing a lot of inserts on the primary and the nightly mongodump on both secondaries overlapped. When the secondaries both lagged about 1000-1500 seconds, simple find queries on the primary started to take 30 seconds. I recovered by killing the mongodumps and restarting mongod on both secondaries.

Thanks,
Mike

Comment by Michael Smith [ 19/Dec/17 ]

Thanks Kelsey.

Is there anywhere I could read up on how readConcernMajority, replication lag, and cache pressure interact? A couple of weeks ago I had a primary in another cluster fail:

  • during a delete operation removing ~4 million rows from a table with ~12-15m rows
  • ... while mongodump was running on the secondaries
  • ... and one of the secondaries is in Azure and hitting what I now understand is SERVER-31215 / WT-3461
  • ... on 3.4.6

Is the issue that something can't be evicted from the WiredTiger cache until one (all?) replicas have acknowledged some portion of the oplog, and with the cache full, no operations can make progress?

Comment by Kelsey Schubert [ 19/Dec/17 ]

Hi Mike,

Thanks for providing the logs and diagnostic.data for the secondary. After examining these files, we have a hypothesis that root cause of the lag on the secondary was caused by SERVER-21307. As readConcernMajority is enabled, this lag caused cache pressure eventually leading to the observed "freeze". We are working on a reproduction to confirm this hypothesis and determine mitigation strategies, and we'll continue to update this ticket as our investigation continues.

Kind regards,
Kelsey

Comment by Michael Smith [ 19/Dec/17 ]

Hi Kelsey,

I've uploaded the secondaries' logs and diagnostic.data to the portal.

Best,
Mike

Comment by Kelsey Schubert [ 19/Dec/17 ]

Hi Mike,

Thanks for providing the diangostic.data and logs for the affected mongod; it's very helpful. Would you please provide the same diagnostics for one of the secondaries?

I see that the two secondaries begin lagging when the text index is starts being built, eventually leading to cache pressure that slows the system. The logs and diagnostic.data from a secondary will give us a clearer picture of the state of the cluster as whole preceding this issue.

Thanks,
Kelsey

Comment by Michael Smith [ 19/Dec/17 ]

Uploaded. Thanks!

Comment by Kelsey Schubert [ 19/Dec/17 ]

Hi Mike,

I've created a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Thanks for your help,
Kelsey

Comment by Michael Smith [ 19/Dec/17 ]

Hi @kelsey.schubert,

Happy to upload diagnostic.data and logs to a secure portal if they can stay private.

Best,
Mike

Comment by Kelsey Schubert [ 19/Dec/17 ]

Hi mzs,

Thank you for reporting this issue and providing stacks; we're still looking through them. It may help our investigation if we could review the diagnostic.data and complete mongod logs. Would you please upload an archive of the diagnostic.data directory and the complete mongod logs covering this event?

If needed, I'd be happy to provide a secure portal.

Thank you,
Kelsey

Comment by Michael Smith [ 19/Dec/17 ]

Forgot to mention: during the stall, CPU was pegged at 400% (on a 4-processor machine).

Generated at Thu Feb 08 04:30:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.