[SERVER-43663] Shardsrv start failure. Backtrace Created: 26/Sep/19  Updated: 08/Jan/24  Resolved: 24/Oct/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 4.2.0
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: kostiantyn velychkovskyi Assignee: Benjamin Caimano (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongo_err.txt    
Operating System: ALL
Steps To Reproduce:
  1. Install mongod 4.2 in shared cluster mode with 3 replica set
  2.  Write ~300G data to it with high througput
  3. One of RS is unexpected down
Participants:

 Description   

I got my mongodb replica set, as a member of shared cluster from 3 replica sets down, after few days of active writes to it.

Sharding consist from 3 replica set.

"заполнитель кода - code placeholder"

2019-09-26T16:50:35.607+0200 I REPL [repl-writer-worker-15] applied op: CRUD \{ ts: Timestamp(1569506086, 1890), t: 2, h: 0, v: 2, op: "d", ns: "buzzguru_ master.video", ui: UUID("f80fd760-d9a4-4bd8-871c-0f604c5dcf6b"), fromMigrate: true, wall: new Date(1569506086663), o: { _id: ObjectId('5cb4ab818cd87f60268e705 7') } }, took 108ms 
2019-09-26T16:50:35.618+0200 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to m4.buzz.guru:27018 
2019-09-26T16:50:35.625+0200 I REPL [repl-writer-worker-0] applied op: CRUD \{ ts: Timestamp(1569506086, 1630), t: 2, h: 0, v: 2, op: "d", ns: "buzzguru_m aster.video", ui: UUID("f80fd760-d9a4-4bd8-871c-0f604c5dcf6b"), fromMigrate: true, wall: new Date(1569506086565), o: { _id: ObjectId('5cb4ab7f8cd87f60268e628a ') } }, took 106ms   
2019-09-26T22:00:30.614+0200 F - [initandlisten] Invariant failure _execStatus == ExecutionStatus::NOT_SCHEDULED src/mongo/util/periodic_runner_impl.cpp 64 2019-09-26T22:00:30.614+0200 F - [initandlisten] ***aborting after invariant() failure 
2019-09-26T22:00:30.620+0200 F - [initandlisten] Got signal: 6 (Aborted). 0x556b48dfdc81 0x556b48dfd47e 0x556b48dfd516 0x7f7b2c9d75f0 0x7f7b2c630337 0x7f7b2c631a28 0x556b47337916 0x556b47131321 0x556b47921c2c 0x556b473bab16 0x556b473bc82d 0x556b47342ec9 0x7f7b2c61c505 0x556b473b7dde
----- BEGIN BACKTRACE ----- 
\{"backtrace":[{"b":"556B4667E000","o":"277FC81","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"556B4667E000","o":"277F47E"},\{"b":"556B4667E000","o":"277F516"},\{"b":"7F7B2C9C8000","o":"F5F0"},\{"b":"7F7B2C5FA000","o":"36337","s":"gsignal"},\{"b":"7F7B2C5FA000","o":"37A28","s":"abort"},\{"b":"556B4667E000","o":"CB9916","s":"_ZN5mongo22invariantFailedWithMsgEPKcRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_j"},\{"b":"556B4667E000","o":"AB3321"},\{"b":"556B4667E000","o":"12A3C2C","s":"_ZN5mongo18PeriodicRunnerImpl15PeriodicJobImpl5startEv"},\{"b":"556B4667E000","o":"D3CB16"},\{"b":"556B4667E000","o":"D3E82D"},\{"b":"556B4667E000","o":"CC4EC9"},\{"b":"7F7B2C5FA000","o":"22505","s":"__libc_start_main"},\{"b":"556B4667E000","o":"D39DDE"}],"processInfo":\{ "mongodbVersion" : "4.2.0", "gitVersion" : "a4b751dcf51dd249c5865812b390cfd1c0129c30", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.19.43-300.el7.x86_64", "version" : "#1 SMP Fri May 17 00:06:07 UTC 2019", "machine" : "x86_64" }, "somap" : [ \{ "b" : "556B4667E000", "elfType" : 3, "buildId" : "E8D75D13E92279CB6AF8104353A95729FD262FAB" }, \{ "b" : "7FFEEAEEC000", "elfType" : 3, "buildId" : "83D4E2FD2DC72D673299472CF30C61012C60FDE4" }, \{ "b" : "7F7B2DDF6000", "path" : "/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "7C71A471444AD18F73AFAEA3EB42431A6DA96534" }, \{ "b" : "7F7B2DBDD000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "3009B26B33156EAAF99787AA3DA0C6AE99649755" }, \{ "b" : "7F7B2D77A000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "4CF1939F660008CFA869D8364651F31AACD2C1C4" }, \{ "b" : "7F7B2D508000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "3B305C3BA17FE394862E749763F2956C9C890C2E" }, \{ "b" : "7F7B2D304000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "18113E6E83D8E981B8E8D808F7F3DBB23F950A1D" }, \{ "b" : "7F7B2D0FC000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "4749697BF078337576C4629F0D30B296A0939779" }, \{ "b" : "7F7B2CDFA000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5681C054FDABCF789F4DDA66E94F1F6ED1747327" }, \{ "b" : "7F7B2CBE4000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "DAC0179F4555AEFEC9E97476201802FD20C03EC5" }, \{ "b" : "7F7B2C9C8000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "8B33F7F8C86F8D544C63C5541A8E42B3DDFEF8B1" }, \{ "b" : "7F7B2C5FA000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "398944D32CF16A67AF51067A326E6C0CC14F90ED" }, \{ "b" : "7F7B2E060000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5CC1A53B747A7E4D21198723C2B633E54F3C06D9" }, \{ "b" : "7F7B2C3C7000", "path" : "/lib64/libidn.so.11", "elfType" : 3, "buildId" : "2B77BBEFFF65E94F3E0B71A4E89BEB68C4B476C5" }, \{ "b" : "7F7B2C19A000", "path" : "/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "1AF123CADB2F2910E89CBD540A06D3B33692F95E" }, \{ "b" : "7F7B2BF41000", "path" : "/lib64/libssl3.so", "elfType" : 3, "buildId" : "B6321C434B5C7386B144B925CEE2798D269FDDF5" }, \{ "b" : "7F7B2BD19000", "path" : "/lib64/libsmime3.so", "elfType" : 3, "buildId" : "BDA454441F59F41D2DA36E13CEA1FC4CE95B2BBB" }, \{ "b" : "7F7B2B9EA000", "path" : "/lib64/libnss3.so", "elfType" : 3, "buildId" : "D61EB90C9F32CA6E81E7FAC437F2C496438C8D9E" }, \{ "b" : "7F7B2B7BA000", "path" : "/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "1E366A2153AD7488EE72E989D9AD6BD458BE8EDE" }, \{ "b" : "7F7B2B5B6000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "325B8CE57A776DE0B24B362A7E0C90E903B1A4B8" }, \{ "b" : "7F7B2B3B1000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "0460FF10A3C63749113D380C40E10DFCF066C76E" }, \{ "b" : "7F7B2B173000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "8840B019EDB66B0CFBD2F77EF196440F7928106E" }, \{ "b" : "7F7B2AF26000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "E2AA8CA3D3164E7DBEC293BFA0B55D2B10DAC05D" }, \{ "b" : "7F7B2AC3D000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "3EE7267AF7BFD3B132E6A222D997DA09C96C90DD" }, \{ "b" : "7F7B2AA0A000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "82E28CACB60C27CD6F14A6D2268F0CFF621664D0" }, \{ "b" : "7F7B2A806000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "67E935BFABA2C914C01156B88947DD515EA51170" }, \{ "b" : "7F7B2A5F7000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "3192C56CD451E18EB9F29CB045432BA9C738DD29" }, \{ "b" : "7F7B2A3A2000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "F1FADDDE0D21D5F4E2DCADEDD3B85B6E7AAC9883" }, \{ "b" : "7F7B2A18C000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, \{ "b" : "7F7B29F7C000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "4F5FBB2087BE132892467C4E7A46A3D07E5DA40B" }, \{ "b" : "7F7B29D78000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, \{ "b" : "7F7B29B5B000", "path" : "/lib64/libsasl2.so.3", "elfType" : 3, "buildId" : "E2F2017F821DD1B9D307DA1A9B8014F2941AEB7B" }, \{ "b" : "7F7B29934000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, \{ "b" : "7F7B296FD000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "84467C988F41D853C58353BEB247670E15DA8BAD" }, \{ "b" : "7F7B2949B000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" }, \{ "b" : "7F7B29298000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "197680DAE6538245CB99723E57447C4EF2E98362" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x556b48dfdc81] mongod(+0x277F47E) [0x556b48dfd47e] mongod(+0x277F516) [0x556b48dfd516] libpthread.so.0(+0xF5F0) [0x7f7b2c9d75f0] libc.so.6(gsignal+0x37) [0x7f7b2c630337] libc.so.6(abort+0x148) [0x7f7b2c631a28] mongod(_ZN5mongo22invariantFailedWithMsgEPKcRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES1_j+0x0) [0x556b47337916] mongod(+0xAB3321) [0x556b47131321] mongod(_ZN5mongo18PeriodicRunnerImpl15PeriodicJobImpl5startEv+0xCC) [0x556b47921c2c] mongod(+0xD3CB16) [0x556b473bab16] mongod(+0xD3E82D) [0x556b473bc82d] mongod(+0xCC4EC9) [0x556b47342ec9] libc.so.6(__libc_start_main+0xF5) [0x7f7b2c61c505] mongod(+0xD39DDE) [0x556b473b7dde]
----- END BACKTRACE -----

 



 Comments   
Comment by Benjamin Caimano (Inactive) [ 24/Oct/19 ]

Thanks kostiantyn.velychkovskyi@datawebglobal.com, apologies again on the slow response. Please do reopen this ticket if it reoccurs and let us know if you encounter other crashes.

Comment by kostiantyn velychkovskyi [ 24/Oct/19 ]

Unfortunately, these logs have been rotated and I can't provide you the relevant information in this case. 

If you can reproduce bug, close this issue. 

I will open it if it happens again. 

Comment by Benjamin Caimano (Inactive) [ 23/Oct/19 ]

kostiantyn.velychkovskyi@datawebglobal.com, apologies on the delay. I notice that your invariant happens on a user connection labeled "[conn410395]". Would you please upload all of the lines of your log file that are tagged with "[conn410395]"? I suspect that there will be at least one more line with "SHARDING" in the metadata. It would also be interesting to know the length of time between the first line of your "aggregate" cmd and the invariant failure.

Comment by kostiantyn velychkovskyi [ 27/Sep/19 ]

Hi! Thank you for assist, I've cut moment from the logs when mongo was crashed first time

This is logs from primary member of replica set

mongo_err.txt

Comment by Mira Carey [ 26/Sep/19 ]

Also, can you provide the full logs for the node that crashed? It would be very helpful to see more context around what was happening in the system when the crash occured

Comment by Mira Carey [ 26/Sep/19 ]

kostiantyn.velychkovskyi@datawebglobal.com, just to check in, are you saying that one of your mongod's crashed after successfully coming up and running for several days?

The reason why I ask is that the backtrace you've provided would usually only be reachable at startup (it indicates a failure before we begin accepting client connections).

I'd like to confirm if this is a node crashing on startup, or an error you observed after a period of time on what otherwise looked like a healthy node.

Generated at Thu Feb 08 05:03:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.