[SERVER-25868] Incomplete FTDC files on Windows Created: 30/Aug/16 Updated: 28/Apr/17 Resolved: 07/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | 3.2.0, 3.2.9, 3.3.9, 3.3.10, 3.3.11 |
| Fix Version/s: | 3.2.13, 3.3.14 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Mark Benvenuto |
| Resolution: | Done | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v3.2
|
||||||||||||
| Sprint: | Platforms 2016-09-19 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Using the test case from
This did not occur in 3.3.8:
Since the $group operation in the test case from Also note that while the FTDC stall only occurs in 3.3.9 and later, the slow $group operation on Windows from FTDC data for 3.3.8 and 3.3.9 is attached. |
| Comments |
| Comment by Githook User [ 17/Apr/17 ] | |||||||||||
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: (cherry picked from commit f73ae5650e6fb65268f26c8eeb9dc6aa0a2f474f) | |||||||||||
| Comment by Githook User [ 17/Apr/17 ] | |||||||||||
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: (cherry picked from commit b5277068ff66e7fe2b2a98144aa6993ca6e17f5c) | |||||||||||
| Comment by Githook User [ 07/Sep/16 ] | |||||||||||
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: | |||||||||||
| Comment by Githook User [ 07/Sep/16 ] | |||||||||||
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: | |||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Aug/16 ] | |||||||||||
|
It looks like the problem of incomplete metrics.* files was present in 3.2 as well - confirmed on 3.2.9 and 3.2.0. | |||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Aug/16 ] | |||||||||||
|
Good catch; looks like the 3.3.8 metrics.2016-* file (but not metrics.interim file) is also short (last BSON document incomplete). The lengths of both files are multiples of 4 kB, so I suspect an incomplete write somewhere. I imagine the difference between 3.3.8 and 3.3.9 in my test relates to the exact chunk structure of the FTDC file, and the alignment with the $group command relates to that chunk structure as well, since execution of a command could change the serverStatus schema and require starting a new chunk. In other words, looks like a more general issue with incomplete FTDC files, and goes back prior to 3.3.9. | |||||||||||
| Comment by Mark Benvenuto [ 30/Aug/16 ] | |||||||||||
|
I have not been able to repro this on 6 core x64, 32 GB RAM Windows 10 with either 3.3.9 or 3.3.12. BTW, the 3.3.9 diagnostic data file: metrics.2016-08-30T14-03-24Z-00000 is missing data as it comes up short for the last BSON document. Here is the test I used:
| |||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Aug/16 ] | |||||||||||
|
On Windows. On Linux the $group operation completes very quickly so we can't tell (from this test case) whether it also affects Linux. | |||||||||||
| Comment by Daniel Pasette (Inactive) [ 30/Aug/16 ] | |||||||||||
|
was this taken on windows or linux? | |||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Aug/16 ] | |||||||||||
|
Periodic stack traces captured on Windows during the test run show the slow $group operation, marked by the blue lines below. However during this period, when no FTDC samples were captured, we don't see the FTDC thread doing anything unusual - as expected, we never see it doing anything but sleeping waiting to produce the next sample. Maybe for some reason the sleep isn't waking up at the expected time during the slow $group operation?
|