[SERVER-5144] increased memory usage, crashes when upgrading from 1.8.1 to 2.0.3 Created: 29/Feb/12 Updated: 15/Aug/12 Resolved: 13/Jul/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Javier Frias | Assignee: | Mathias Stearn |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | Solaris | ||||
| Participants: | |||||
| Description |
|
We just had failed upgrades on two of our mongodb replica set clusters. Both are setup as such. virtual/mapped memory went from 3-7GB, to immediate 34GB one one replica set, and 16GB on another. Both sets of boxes have 4GB memory and 8GB swap on version 1.8.1, same data, with active traffic. 18180 root 4 45 0 3522M 1999M cpu/13 4:02 6.25% mongod on version 2.0.3, same data, same node, with active traffic PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND all are running under solaris zones ( joyent ), but are just regular joyent slices. after about 30 minutes of traffic, mongod's would crash. and replica fail over would continue, back and forth. We were upgrading to 2.0.3 to try to address the constant fail over issue. Performance is fine, so that's why we've kept the boxes at 4GB, our active data set is low. ( on disk data is about 10GB on both). this is in 64bit |
| Comments |
| Comment by Ian Whalen (Inactive) [ 13/Jul/12 ] |
|
@javier I'm closing this now, but please reopen if you can attach further logs or provide any further info from the sun debugger. |
| Comment by Dwight Merriman [ 03/Mar/12 ] |
|
can you post the end of a log with a little more detail. if there is more. i see the ' terminate() called, printing stack:' above but nothing else. so wondering what is up. this is almost surely sun specific. we will fix but could take a little time without more detail. running in the sun debugger would be interesting, or chatting with joyent support too. |
| Comment by Javier Frias [ 02/Mar/12 ] |
|
also, a few of the crashes, this has been the text preceding the "starting mongo. – |
| Comment by Javier Frias [ 02/Mar/12 ] |
|
we've added an extra 4GB to the solaris zone running this instance, and the problem still persists. We are also going to change our front end, so that these very large queries don't happen any more, and see if that stops the crashes. |
| Comment by Javier Frias [ 02/Mar/12 ] |
|
It doesnt seem to log the crash. Basically, the scenario is mongodb gets a series of very large result queries, and it exits/crashes. Here's the log from svc( solaris process watcher ) for that time. I cut away the query right before the "MongoDB starting" as its pretty large, but can send you a gzipped version of the log with everything if you'd like ( its going to be a few hundred megs ). Just to give you an idea of how often it happens, heres the last two days ( there's a slight discrepancy, since some the times there was manual intervention ). >grep "MongoDB starting" mongo.log | gegrep -B 1 '(Mar| Feb 29)' >grep "Stopping because all processes in service exited" /var/svc/log/mongo | egrep '(Mar| Feb 29)' Just so you see one of the long queries that crash it. >ggrep -B 1 "MongoDB starting" mongo.log | gegrep -B 1 '(Mar| Feb 29)' | tail -2 Fri Mar 2 16:18:28 [conn128] getmore inventory.inventory cid:59281348262697 getMore: { bookingdate: { $gte: 20120302, $lt: 20120401 }, orderModification: 0, price: { $gt: 0 }, propertyId: { $in: [ 8, 451, 614, 361, 77, 57, 597, 509, 66, 178, 18, 568, 285, 234, 2, 408, 794, 245, 358, 53, 322, 292, 220, 521, 183, 65, 157, 49, 400, 764, 70, 172, 608, 219, 674, 478, 24, 37, 223, 266, 1454, 511, 770, 1143, 64, 1177, 456, 472, 363, 1415, 519, 111, 304, 110, 409, 396, 1294, 4, 182, 43, 81, 810, 417, 218, 1130, 555, 572, 123, 384, 825, 367, 1087, 835, 659, 733, 1344, 517, 696, 167, 582, 1043, 41, 1160, 645, 845, 732, 330, 76, 139, 40, 254, 545, 604, 1944, 99, 457, 17, 530, 819, 141, 198, 1076, 256, 269, 428, 1446, 497, 3, 633, 31, 636, 296, 997, 670, 323, 370, 295, 482, 603, 288, 898, 341, 29, 790, 1684, 62, 1212, 529, 283, 744, 1209, 74, 113, 979, 1329, 686, 660, 1068, 960, 615, 1804, 233, 974, 171, 273, 1132, 137, 207, 108, 446, 402, 484, 1270, 91, 214, 847, 803, 978, 279, 1020, 250, 829, 2205, 383, 1009, 100, 999, 2259, 487, 246, 35, 368, 189, 425, 512, 903, 224, 536, 5, 376, 226, 116, 73, 578, 345, 439, 662, 1003, 461, 991, 1363, 1357, 1128, 1059, 538, 1048, 87, 2394, 1398, 751, 274, 2078, 389, 546, 85, 342, 1404, 1862, 596, 1179, 523, 773, 284, 500, 2119, 757, 1080, 900, 880, 317, 442, 2530, 2316, 741, 294, 543, 481, 426, 1351, 420, 1850, 184, 1004, 548, 600, 326, 1793, 704, 808, 432, 593, 694, 1369, 174, 471, 1284, 1109, 303, 605, 721, 827, 2061, 526, 319, 1250, 573, 1106, 190, 1439, 72, 1846, 506, 1455, 730, 867, 494, 126, 895, 360, 448, 318, 574, 1376, 2479, 571, 1947, 874, 1300, 398, 737, 763, 201, 2176, 2411, 646, 598, 449, 1845, 1222, 243, 962, 828, 973, 554, 1307, 1368, 203, 1355, 2441, 943, 2190, 1085, 966, 2414, 1320, 2247, 2091, 2046, 1999, 693, 1982, 938, 585, 1302, 338, 774, 882, 655, 34, 918, 963, 477, 1831, 240, 1044, 26, 1553, 616, 2152, 987, 1453, 407, 850, 589, 621, 2056, 499, 1129, 1276, 156, 1267, 533, 873, 752, 740, 736, 1105, 1759, 1633, 39, 643, 2215, 1269, 2223, 1598, 180, 397, 2447, 1008, 492, 1409, 258, 112, 2240, 964, 1054, 152, 1133, 739, 1554, 1022, 1371, 726, 1216, 893, 1170, 800, 823, 1183, 915, 375, 359, 1950, 666, 985, 806, 881, 1809, 1345, 1146, 1736, 1532, 748, 941, 1245, 1533, 1046, 624, 1752, 926, 956, 899, 807, 382, 2166, 1422, 986, 1889, 1382, 1934, 1597, 1865, 1901, 1113, 679, 1690, 669, 2264, 766, 902, 1888, 1035, 651, 1223, 193, 2299, 1111, 1155, 2125, 1233, 746, 1247, 884, 532, 1722, 1217, 894, 772, 2127, 1186, 227, 2513, 1626, 1056, 1322, 709, 2131, 1900, 1936, 1391, 1741, 1167, 1635, 2035, 1271, 1711, 2458, 131, 2285, 2037, 1408, 1285, 1308, 431, 2137, 1784, 2117, 2203, 1568, 452, 866, 1883, 2096, 1244, 2160, 1760, 629, 769, 1769, 1195, 1165, 2043, 1305, 2068, 1010, 458, 870, 528, 2104, 1951, 1045, 2172, 1740, 1338, 1410, 601, 2225, 628, 1636, 1192, 2468, 1396, 1279, 313, 2098, 186, 1681, 2350, 2103, 872, 1595, 804, 768, 2384, 754, 2136, 968, 2139, 635, 2220, 1851, 1818, 2025, 503, 1512, 1147, 2107, 1051, 1328, 2274, 1618, 1963, 1958, 2006, 1265, 1311, 1583, 2189, 1569, 1523, 1406, 1938, 1211, 1152, 1019, 1435, 1112, 1842, 1430, 2087, 1277, 1783, 1807, 1530, 796, 1492, 1705, 975, 2138, 1166, 1287, 1746, 205, 1266, 1835, 993, 2275, 1646, 1826, 1770, 1733, 1683, 2086, 2252, 1927, 1587, 2038, 1940, 2159, 1380, 706, 2374, 2404, 1115, 1942, 814, 443, 1844, 430, 1377, 2507, 1319, 1582, 1282, 2175, 1330, 1825, 2003, 1256, 1843, 1451, 2022, 2286, 2398, 1498, 1905, 889, 1169, 1813, 1935, 1899, 1894, 1962, 980, 1653, 1827, 1047, 1937, 2013, 782, 1941, 939, 657, 767, 547, 1041, 1808, 1527, 61, 904, 1994, 1885, 1668, 1378, 1745, 1872, 919, 1306, 2012, 1788, 2128, 1061, 2327, 2417, 1425, 1904, 1949, 1707, 1251, 2135, 1939, 1427, 1180, 1786, 2261, 1849, 2047, 1778, 1063, 1724, 2422, 1012, 2209, 2421, 1062, 747, 892, 1649, 1274, 1879, 2289, 1754, 977, 2171, 2101, 1550, 1487, 475, 2194, 2089, 1246, 1975, 2196, 1503, 1747, 2088, 1301, 1806, 1589, 1995, 1437, 1110, 1984, 2195, 1413, 2036, 2102, 2375, 2343, 1588, 1812, 1299, 2164, 1918, 2323, 2238, 1431, 1657, 2510, 2301, 687, 1753, 2443, 749, 2465, 1570, 2122, 2231, 2123, 2069, 1450, 2076, 1392, 2167, 1847, 1185, 1848, 1961, 2224, 140, 801, 864, 877, 936, 1018, 1038, 1039, 1040, 1066, 1148, 1151, 1164, 1303, 1314, 1370, 1447, 1448, 1449, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1486, 1488, 1493, 1494, 1496, 1497, 1499, 1500, 1501, 1502, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1522, 1524, 1525, 1526, 1528, 1529, 1531, 1534, 1535, 1536, 1537, 1538, 1539, 1543, 1547, 1548, 1549, 1551, 1552, 1574, 1576, 1578, 1581, 1585, 1590, 1591, 1593, 1596, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1608, 1609, 1610, 1613, 1614, 1615, 1617, 1624, 1625, 1627, 1641, 1643, 1645, 1647, 1648, 1650, 1651, 1652, 1654, 1655, 1656, 1658, 1659, 1660, 1661, 1666, 1671, 1673, 1674, 1678, 1680, 1696, 1697, 1699, 1700, 1701, 1702, 1704, 1708, 1713, 1714, 1715, 1716, 1717, 1723, 1728, 1732, 1737, 1738, 1739, 1742, 1749, 1756, 1757, 1762, 1764, 1766, 1767, 1768, 1776, 1780, 1782, 1789, 1790, 1791, 1794, 1803, 1852, 1853, 1858, 1861, 1868, 1869, 1870, 1871, 1882, 1890, 1897, 1898, 1903, 1906, 1907, 1908, 1909, 1914, 1915, 1916, 1917, 1930, 1957, 1964, 1979, 1981, 1996, 2001, 2002, 2004, 2007, 2008, 2009, 2011, 2017, 2018, 2026, 2028, 2029, 2030, 2052, 2053, 2054, 2062, 2063, 2065, 2066, 2067, 2079, 2080, 2081, 2082, 2083, 2111, 2146, 2147, 2148, 2149, 2153, 2154, 2157, 2168, 2170, 2177, 2178, 2187, 2197, 2198, 2199, 2200, 2201, 2202, 2208, 2210, 2216, 2217, 2221, 2233, 2241, 2260, 2262, 2269, 2270, 2271, 2279, 2280, 2282, 2290, 2291, 2292, 2293, 2305, 2334, 2336, 2337, 2345, 2363, 2368, 2370, 2371, 2379, 2385, 2387, 2388, 2389, 2418, 2419, 2420, 2432, 2438, 2446, 2452, 2482, 2483, 2484, 2485, 2488, 2489, 2508 ] } } bytes:4194692 nreturned:6446 184ms |
| Comment by Eliot Horowitz (Inactive) [ 02/Mar/12 ] |
|
Is there something else missing from the logs? |
| Comment by Javier Frias [ 01/Mar/12 ] |
|
Would there be any way to troubleshoot the constant mongo crashes and failovers? The problem, while always there, was greatly exacerbate when we tried the upgrade. This is the log as the crash happens... The process, as you can see, just crashes when it gets a series of very large result queries. |
| Comment by Eliot Horowitz (Inactive) [ 01/Mar/12 ] |
|
2.0 has journalling enabled by default which doubles virtual memory space even though actual ram requirements are the same |