[SERVER-25365] MongoDB 3.2.1 crash during resync Created: 30/Jul/16  Updated: 30/Jul/16  Resolved: 30/Jul/16

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tinco Andringa Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-22390 RangeDeleter crashes primary node due... Closed
Operating System: ALL
Participants:

 Description   

Hi,

One of our MongoDB shards last replica just crashed while resyncing to its replica which had crashed due to a known 3.2.1 bug. This caused us some downtime, crashing during a resync is rather painful..

It's rebooting now, and will be rebooting in to 3.2.3, but booting takes a long time (it's a fairly large wiredtiger db) so we'll be down for a while.

Please let me know if you need more info or if it's a dupe or something.

Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: 2016-07-30T11:56:45.944+0000 I NETWORK  [conn7617] end connection 10.100.0.0:56419 (110 connections now open)
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: 2016-07-30T11:56:46.013+0000 I CONTROL  [migrateThread]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  0x12d56a2 0x1271304 0x125d592 0xdf116e 0xdf21fa 0xf24447 0xf2720e 0xf276a0 0x1a99020 0x7f354a432b50 0x7f354a17c95d
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: ----- BEGIN BACKTRACE -----
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: {"backtrace":[{"b":"400000","o":"ED56A2"},{"b":"400000","o":"E71304"},{"b":"400000","o":"E5D592"},{"b":"400000","o":"9F116E"},{"b":"400000","o":"9F21FA"},{"b":"400000","o":"B24447"},{"b":"400000","o":"B2720E"},{"b":"400000","o":"B276A0"},{"b":"400000","o":"1699020"},{"b":"7F354A42C000","o":"6B50"},{"b":"7F354A0A1000","o":"DB95D"}],"processInfo":{ "mongodbVersion" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-54-generic", "version" : "#91-Ubuntu SMP Tue May 26 19:15:08 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "3B3C1850090787FE4C41A37E61546193A0F0C0EC" }, { "b" : "7FFE2FAFD000", "elfType" : 3, "buildId" : "1C0D0A18FF043EED9EE11DB5E5E90A3F74729341" }, { "b" : "7F354B5EB000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "B3CDB575971D8B696AC6A6F8872620CAC9D2D273" }, { "b" : "7F354B1F3000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "E75BB3082373F77A79E8030BDC6C6A280284E759" }, { "b" : "7F354AFEB000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "F58D5DE3E7A2989E915422BA4203FE53DBA449A0" }, { "b" : "7F354ADE7000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "5D1CA3A3D93ED5B6C6462FFA03E787FDBE4013A3" }, { "b" : "7F354AAE0000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "8711429397A5AF8B6269B867D830EDF6E0225B8D" }, { "b" : "7F354A85E000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "7F58D6664571941C86B2D969701A572AD4D7BF1D" }, { "b" : "7F354A648000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "F980B1188708F8D8B5C35D185444AF4CB939AA1E" }, { "b" : "7F354A42C000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "FEF281218797AD6AE726DD5FCEDECADD9E9F51DC" }, { "b" : "7F354A0A1000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "A745EBA2C16BA80AE1EF1A7A7B70740C2CF1B363" }, { "b" : "7F354B84C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9B23F2A44CC8CA6175CBD8D64584B1C7EA5FD18C" }, { "b" : "7F3549E8A000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "1EFEB71FD4999C2307570D673A724EA4E1D85267" } ] }}
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12d56a2]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(_ZN5mongo10logContextEPKc+0x134) [0x1271304]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(_ZN5mongo23fassertFailedWithStatusEiRKNS_6StatusE+0x62) [0x125d592]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(+0x9F116E) [0xdf116e]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(_ZN5mongo12RangeDeleter9deleteNowEPNS_16OperationContextERKNS_19RangeDeleterOptionsEPSs+0x4CA) [0xdf21fa]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(_ZN5mongo27MigrationDestinationManager14_migrateDriverEPNS_16OperationContextERKSsRKNS_7BSONObjES7_S7_S4_RKNS_3OIDERKNS_19WriteConcernOptionsE+0xB87) [0xf24447]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(_ZN5mongo27MigrationDestinationManager14_migrateThreadESsNS_7BSONObjES1_S1_SsNS_3OIDENS_19WriteConcernOptionsE+0xAE) [0xf2720e]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(+0xB276A0) [0xf276a0]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  mongod(+0x1699020) [0x1a99020]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  libpthread.so.0(+0x6B50) [0x7f354a432b50]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:  libc.so.6(clone+0x6D) [0x7f354a17c95d]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: -----  END BACKTRACE  -----
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: 2016-07-30T11:56:46.013+0000 I -        [migrateThread]
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]: ***aborting after fassert() failure
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:
Jul 30 13:58:32 us-mongo-3....com docker/mongo-blob-4[1062]:
Jul 30 13:58:33 us-mongo-3....com docker/mongo-blob-4-manager[1062]: ++++++ Error: Worker exited with exit code: 14
Jul 30 13:58:33 us-mongo-3....com docker/mongo-blob-4-manager[1062]: ++++++ Attempting to recover worker
Jul 30 13:58:33 us-mongo-3....com docker/mongo-blob-4-manager[1062]: ++++++ Restarting worker



 Comments   
Comment by Tinco Andringa [ 30/Jul/16 ]

Thanks! We'll be rolling our cluster over to 3.2.3 asap.

Comment by Kelsey Schubert [ 30/Jul/16 ]

Hi tinco,

Thank you for uploading the complete logs for the log. After review, this issue appears to be a duplicate of SERVER-22390. SERVER-22390 includes a fix for MongoDB 3.2.3.

Kind regards,
Thomas

Comment by Ramon Fernandez Marina [ 30/Jul/16 ]

Yep, the upload worked. Here's the interesting line:

2016-07-30T11:56:45.841+0000 I -        [migrateThread] Fatal assertion 18512 WriteConcernFailed waiting for replication timed out

I don't recall seeing this before, but we'll investigate. Please continue to watch the ticket for updates.

Comment by Tinco Andringa [ 30/Jul/16 ]

Just a note about our database. We have a database per customer, and currently have a collection in that database per day. This is because in the old database engine we used to get much better delete performance this way. We have not tested if delete performance still is better that way now, I've seen WiredTiger has a significantly different file layout, so maybe our architecture is a bit crazy now.

Comment by Tinco Andringa [ 30/Jul/16 ]

Thanks Ramon, I've uploaded the logs, though it didn't go to a 'success' page, did it work?

Comment by Ramon Fernandez Marina [ 30/Jul/16 ]

tinco, can you please upload full logs for this node from the last startup until the fassert()? I think we'll need more than the snippet above to track this down.

I've created a safe, secure upload portal so your logs can only be accessed by us for the purpose of debugging this isse.

Thanks,
Ramón.

Generated at Thu Feb 08 04:08:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.