[CDRIVER-2985] Investigate flaky test: /Client/last_write_date/pooled Created: 28/Feb/19 Updated: 28/Oct/23 Resolved: 29/Apr/20 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc, tests |
| Affects Version/s: | None |
| Fix Version/s: | 1.17.0-beta2, 1.17.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kevin Albertson | Assignee: | Kevin Albertson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | flaky-tests | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Epic Link: | CDRIVER-2733 | ||||||||||||
| Description |
|
It failed on Windows VS 2015:
On this patch build. The test does the following:
But the error indicates the server description was not updated in the second server selection attempt. It's unclear to me why this could have failed, unless the sleep really wasn't sleeping for 2 seconds, or if there's some other time dependant bug. |
| Comments |
| Comment by Githook User [ 31/May/20 ] | |||||
|
Author: {'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}Message: | |||||
| Comment by Githook User [ 25/May/20 ] | |||||
|
Author: {'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}Message: | |||||
| Comment by Githook User [ 29/Apr/20 ] | |||||
|
Author: {'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}Message: | |||||
| Comment by Kevin Albertson [ 28/Apr/20 ] | |||||
|
Confirmed the hypothesis by downloading the mongodb directories attached to the patch build and inspecting the oplog just before the crash. This appears to be due to a noop write in the oplog shortly before second insert in the test:
This appears to be due to a recent change in mongo orchestration to start the periodic noop writer with an interval of 1s instead of 10s: https://github.com/10gen/mongo-orchestration/commit/066ec58c817947ca136c25690a705fd4e4db2a7b Though, this does beg the question whether there was a different reason this test was failing before. | |||||
| Comment by Kevin Albertson [ 28/Apr/20 ] | |||||
|
This test comes from the max stalenes spec, the "Parse lastWriteDate" test: https://github.com/mongodb/specifications/blob/master/source/max-staleness/max-staleness-tests.rst#parse-lastwritedate Observations
A timeline: Both server descriptions s0 and s1 have lastWriteDate of 1587642725000 in the end. This is first received two seconds after the test starts. Hypothesis: If there was a write from something external (e.g. from the oplog noop writer, a session being deleted, etc.) to the oplog just before the first server selection, then the insert may be happening within the same lastWriteDate second. I can confirm this by dumping the oplog. |