[SERVER-20725] Mongo 3.0.6 secondary crashes when while replicating map-reduce with a background index Created: 01/Oct/15  Updated: 09/Jun/16  Resolved: 04/Dec/15

Status: Closed
Project: Core Server
Component/s: Index Maintenance, MapReduce
Affects Version/s: 3.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Plamen Parvanov Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-19128 Fatal assertion during secondary inde... Closed
Operating System: ALL
Steps To Reproduce:

1. Setup replica set with at least 2-3 servers (adjust voting to be odd) where the secondary is at least 100-300 ms away from primary.
2. Create 'source' collection with some data (try 100 records)
3. Create 'output' collection with background index on one field (in our case composite index).
4. Perform map-reduce with 'reduce' output to the output collection every 10 seconds (in our case 15 minute).

Participants:

 Description   

There seems to be a bug in mongo when using background indexes on collections that are output of map-reduce. In some cases secondary mongos crash.

Details:

Our setup is 1 primary with 3 secondaries. Each server is in different geographical region (NA, Asia, Europe) so there's some network lag.

A map-reduce with 'reduce' to a collection is performed periodically. Occasionally some secondaries crash while building a background index on a temporary collection (created by server for map-reduce) when the collection gets dropped before the index is built.

Map reduce creates multiple temporary collections with same indexes as the main collection where it stores intermediary data while computing and reducing/merging results.

These temp collections (unnecessary) get replicated on all secondaries. The indexes are also created in the background as the original index. In some cases (depending on network lag) the temp collection is processed and deleted on the primary before the secondary has finished creating the index and as a result it crashes.

Or, because of network lag, the create collection, create index in background and drop collection commands come at the same time and the secondary crashes on building index after the collection was already dropped.

Work-around:

Making the map-reduce collection index in foreground forces the temp collection indexes to be built in foreground and thus block any operations on the collection while the index is created. This hopefully would block the drop command till the index is created. These collections are always empty when the index is being created so there's no performance impact.

So far we haven't had any crashes after changing the indexes to foreground.

While this happens while performing map-reduce there might be cases where a script can perform similar create/drop collection very fast and reproduce the problem.

It would be also nice if the temp collections created by map-reduce process are recognized as temp and not replicated to secondaries.



 Comments   
Comment by Kelsey Schubert [ 04/Dec/15 ]

Hi plamen,

Sorry for the long delay getting back you. We were able to identify this ticket as a duplicate of SERVER-19128. This issue has been fixed in our development branch and is included in 3.2.0-rc6. Please watch SERVER-19128 for updates on the requested backport to 3.0. Feel free to comment on it or vote for it.

Thank you,
Thomas

Comment by Ramon Fernandez Marina [ 01/Oct/15 ]

Thanks for your report plamen. As per SERVER-13981 temporary mapReduce collections should not be replicated to secondaries, so I wonder if this is a regression. We'll investigate further and get back to you.

Cheers,
Ramón.

Comment by Plamen Parvanov [ 01/Oct/15 ]

Here's relevant log from one of the secondaries:

2015-09-26T23:25:22.937+0000 I INDEX    [repl index builder 8409] build index on: mbss.tmp.mr.devices_6072 properties: { v: 1, key: { _id.p: 1, _id.m: 1, _id.t: 1 }, name: "_id.p_1__id.m_1__id.t_1", ns: "mbss.tmp.mr.devices_6072", background: true }
2015-09-26T23:25:22.937+0000 I INDEX    [repl index builder 8409] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:22.956+0000 I INDEX    [repl index builder 8410] build index on: mbss.tmp.mr.devices_6072 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.devices_6072", background: true }
2015-09-26T23:25:22.986+0000 I INDEX    [repl index builder 8410] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:23.860+0000 I COMMAND  [repl writer worker 9] CMD: drop mbss.tmp.mr.devices_6072
2015-09-26T23:25:24.122+0000 I INDEX    [repl index builder 8411] build index on: mbss.tmp.mr.StatModels_6073 properties: { v: 1, key: { _id.p: 1, _id.m: 1, _id.t: 1 }, name: "_id.p_1__id.m_1__id.t_1", ns: "mbss.tmp.mr.StatModels_6073", background: true }
2015-09-26T23:25:24.122+0000 I INDEX    [repl index builder 8411] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:24.133+0000 I INDEX    [repl index builder 8412] build index on: mbss.tmp.mr.StatModels_6073 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.StatModels_6073", background: true }
2015-09-26T23:25:24.134+0000 I INDEX    [repl index builder 8412] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:26.529+0000 I COMMAND  [repl writer worker 11] CMD: drop mbss.tmp.mr.StatModels_6073
2015-09-26T23:25:26.905+0000 I INDEX    [repl index builder 8413] build index on: mbss.tmp.mr.StatModels_6074 properties: { v: 1, key: { _id.p: 1, _id.m: 1, _id.t: 1 }, name: "_id.p_1__id.m_1__id.t_1", ns: "mbss.tmp.mr.StatModels_6074", background: true }
2015-09-26T23:25:26.905+0000 I INDEX    [repl index builder 8413] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:26.917+0000 I INDEX    [repl index builder 8414] build index on: mbss.tmp.mr.StatModels_6074 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.StatModels_6074", background: true }
2015-09-26T23:25:26.917+0000 I INDEX    [repl index builder 8414] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:28.629+0000 I NETWORK  [conn41978] end connection 52.8.113.159:50298 (106 connections now open)
2015-09-26T23:25:28.802+0000 I NETWORK  [initandlisten] connection accepted from 52.8.113.159:50308 #41982 (107 connections now open)
2015-09-26T23:25:30.071+0000 I COMMAND  [repl writer worker 0] CMD: drop mbss.tmp.mr.StatModels_6074
2015-09-26T23:25:30.300+0000 I INDEX    [repl index builder 8415] build index on: mbss.tmp.mr.devices_6077 properties: { v: 1, key: { _id.p: 1, _id.co: 1, _id.t: 1 }, name: "_id.p_1__id.co_1__id.t_1", ns: "mbss.tmp.mr.devices_6077", background: true }
2015-09-26T23:25:30.300+0000 I INDEX    [repl index builder 8415] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.308+0000 I INDEX    [repl index builder 8416] build index on: mbss.tmp.mr.devices_6077 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.devices_6077", background: true }
2015-09-26T23:25:30.310+0000 I INDEX    [repl index builder 8416] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.460+0000 I INDEX    [repl index builder 8417] build index on: mbss.tmp.mr.installations_6076 properties: { v: 1, key: { _id.p: 1, _id.co: 1, _id.t: 1 }, name: "_id.p_1__id.co_1__id.t_1", ns: "mbss.tmp.mr.installations_6076", background: true }
2015-09-26T23:25:30.461+0000 I INDEX    [repl index builder 8417] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.469+0000 I INDEX    [repl index builder 8418] build index on: mbss.tmp.mr.installations_6076 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.installations_6076", background: true }
2015-09-26T23:25:30.470+0000 I INDEX    [repl index builder 8418] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.537+0000 I INDEX    [repl index builder 8419] build index on: mbss.tmp.mr.notify_jobs_6075 properties: { v: 1, key: { _id.p: 1, _id.co: 1, _id.t: 1 }, name: "_id.p_1__id.co_1__id.t_1", ns: "mbss.tmp.mr.notify_jobs_6075", background: true }
2015-09-26T23:25:30.537+0000 I INDEX    [repl index builder 8419] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.565+0000 I INDEX    [repl index builder 8420] build index on: mbss.tmp.mr.notify_jobs_6075 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.notify_jobs_6075", background: true }
2015-09-26T23:25:30.565+0000 I INDEX    [repl index builder 8420] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.684+0000 I INDEX    [repl index builder 8421] build index on: mbss.tmp.mr.devices_6078 properties: { v: 1, key: { _id.p: 1, _id.co: 1, _id.t: 1 }, name: "_id.p_1__id.co_1__id.t_1", ns: "mbss.tmp.mr.devices_6078", background: true }
2015-09-26T23:25:30.685+0000 I INDEX    [repl index builder 8421] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.695+0000 I INDEX    [repl index builder 8422] build index on: mbss.tmp.mr.devices_6078 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.devices_6078", background: true }
2015-09-26T23:25:30.695+0000 I INDEX    [repl index builder 8422] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:30.881+0000 I COMMAND  [repl writer worker 12] CMD: drop mbss.tmp.mr.devices_6077
2015-09-26T23:25:31.201+0000 I COMMAND  [repl writer worker 13] CMD: drop mbss.tmp.mr.installations_6076
2015-09-26T23:25:32.566+0000 I COMMAND  [repl writer worker 5] CMD: drop mbss.tmp.mr.devices_6078
2015-09-26T23:25:32.799+0000 I NETWORK  [conn41979] end connection 52.68.20.75:33911 (106 connections now open)
2015-09-26T23:25:32.948+0000 I COMMAND  [repl writer worker 3] CMD: drop mbss.tmp.mr.notify_jobs_6075
2015-09-26T23:25:33.013+0000 I NETWORK  [initandlisten] connection accepted from 52.68.20.75:33920 #41983 (107 connections now open)
2015-09-26T23:25:33.829+0000 I INDEX    [repl index builder 8423] build index on: mbss.tmp.mr.StatAnalytics_6079 properties: { v: 1, key: { _id.p: 1, _id.co: 1, _id.t: 1 }, name: "_id.p_1__id.co_1__id.t_1", ns: "mbss.tmp.mr.StatAnalytics_6079", background: true }
2015-09-26T23:25:33.829+0000 I INDEX    [repl index builder 8423] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:33.848+0000 I INDEX    [repl index builder 8424] build index on: mbss.tmp.mr.StatAnalytics_6079 properties: { v: 1, key: { _id.p: 1, _id.t: 1 }, name: "_id.p_1__id.t_1", ns: "mbss.tmp.mr.StatAnalytics_6079", background: true }
2015-09-26T23:25:33.900+0000 I INDEX    [repl index builder 8423] build index on: mbss.tmp.mr.StatAnalytics_6079 properties: { v: 1, key: { _id.p: 1, _id.co: 1, _id.t: 1 }, name: "_id.p_1__id.co_1__id.t_1", ns: "mbss.tmp.mr.StatAnalytics_6079", background: true }
2015-09-26T23:25:33.901+0000 I INDEX    [repl index builder 8423] build index done.  scanned 0 total records. 0 secs
2015-09-26T23:25:33.970+0000 E INDEX    [repl index builder 8424] IndexBuilder could not build index: Location28550 Unable to complete index build as the collection is no longer readable
2015-09-26T23:25:33.970+0000 I -        [repl index builder 8424] Fatal Assertion 28555
2015-09-26T23:25:34.003+0000 I CONTROL  [repl index builder 8424] 
 0xf78219 0xf170e1 0xefbc61 0xac28ed 0xefe8a0 0xfc53d4 0x7effcbf20df3 0x7effca9d4bcd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B78219"},{"b":"400000","o":"B170E1"},{"b":"400000","o":"AFBC61"},{"b":"400000","o":"6C28ED"},{"b":"400000","o":"AFE8A0"},{"b":"400000","o":"BC53D4"},{"b":"7EFFCBF19000","o":"7DF3"},{"b":"7EFFCA8DE000","o":"F6BCD"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "3.14.42-31.38.amzn1.x86_64", "version" : "#1 SMP Wed May 13 20:33:05 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "505B8032AB7F3CE0B7DA283775E7F50560651F14" }, { "b" : "7FFF52E95000", "elfType" : 3, "buildId" : "C7F24184C312347AAE8F28C92AC21288D3975482" }, { "b" : "7EFFCBF19000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7EFFCBCAC000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "22480480235F3B1C6C2E5E5953949728676D3796" }, { "b" : "7EFFCB8C7000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F1C146B78505646930DD9003AA2B3622C5226D1B" }, { "b" : "7EFFCB6BF000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7EFFCB4BB000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7EFFCB1B7000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7EFFCAEB5000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7EFFCAC9F000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7EFFCA8DE000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8BC02D100DFD8DDD136043AC1830506CE6F19602" }, { "b" : "7EFFCC135000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7EFFCA692000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "FF843C37C38E5BFFD57F7BCCAE05FDADC6390C8F" }, { "b" : "7EFFCA3AF000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "0BB150CC29DB5B0E039879EFC00152A75E3B00B9" }, { "b" : "7EFFCA1AC000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "622F315EB5CB2F791E9B64020692EBA98195D06D" }, { "b" : "7EFFC9F7A000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "1485992B0E5CDBA0A34817FC8C6A4C45E82CD1A9" }, { "b" : "7EFFC9D64000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7EFFC9B55000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "A75A81EC50E9E0164A12B59D9987AD61AC7576C8" }, { "b" : "7EFFC9952000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7EFFC9738000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7EFFC9517000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf78219]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf170e1]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xefbc61]
 mongod(_ZN5mongo12IndexBuilder3runEv+0x57D) [0xac28ed]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x120) [0xefe8a0]
 mongod(+0xBC53D4) [0xfc53d4]
 libpthread.so.0(+0x7DF3) [0x7effcbf20df3]
 libc.so.6(clone+0x6D) [0x7effca9d4bcd]
-----  END BACKTRACE  -----
2015-09-26T23:25:34.003+0000 I -        [repl index builder 8424] 
 
***aborting after fassert() failure

Generated at Thu Feb 08 03:55:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.