[JAVA-628] Read after WriteConcern.SAFE does not guarantee to be able to get data? Created: 22/Aug/12 Updated: 16/Nov/21 Resolved: 30/Aug/12 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | API |
| Affects Version/s: | 2.7.3 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor - P4 |
| Reporter: | Solomon Wu | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
MongoDB 2.0.3 |
||
| Attachments: |
|
| Description |
|
3 nodes are involved
steps from our understanding of the document, Writeconcern.safe is equivalent of calling getLastError(). the subsequent read should be able to read the data just written. if we write with writeconern.normal, the subsequent read is not guaranteed to be able to read the data just written. but if we write with writeconern.safe, we should be able to read the data. |
| Comments |
| Comment by Jeffrey Yemin [ 30/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Please open a new case if you find a different issue. | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tihomir Slavkov [ 30/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Jeff, | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 30/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
I didn't realize you were using count. I believe that ntoreturn is always 1 for count, regardless of the return value, because it's a command. But I think this is enough to diagnose for now. At least in these logs, the query for the number 39710019 is for a different field than the update. The query looks like:
while the update looks like:
So I don't see how this works at all. The only place in the logs I see an update to a messageId field is here:
| |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tihomir Slavkov [ 30/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
I attached new file, update_1.txt. The id of the object, which could not be found is 39710019. I'm not sure do i read MongoDB log file properly, but it seems to me that it reports the value was found. However, reading the value with code below fails sometimes: DBCursor mtDbCursor = <DBCollection>.find(mtDBDbObject); if (mtDbCursor != null && mtDbCursor.count() > 0) { // do something } else { | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 28/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
It's true there is a difference between two separate physical nodes and two threads, but I don't see how it's a significant difference so long as the two threads are using two different sockets. | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 28/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
No problem on the delay. What's the id of the document that the fineOne failed to find? | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tihomir Slavkov [ 28/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry for the delay Jeff, i just attached a log snippet from execution, where insert was replaced with update as per instructions. Initially all went fine and then the problem appeared again. I waited ~10 seconds and then stopped collecting log data. Looking at the code snippet which you posted on 23-th, there is one more difference - in our case we have 2 separate physical nodes, one is performing insert and the second is performing read. Perhaps we might need to look in this direction? | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tihomir Slavkov [ 28/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
This call | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
jeff.yemin@10gen.com | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Solomon Wu [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
The logs don't show enough I wrote a test to demonstrate what you're doing, but all in one VM:
Only difference is that I'm using update instead of insert because updates are logged more fully than inserts on the server. Using WriteConcern.SAFE, I haven't seen the reading thread ever fail to find the document that it takes from the blocking queue. There's nothing fundamentally different about this test than your situation, so far as I can see. Due to the way the connection pool works, the two threads are guaranteed to use different sockets. Can you paste the code you're using for the insert? | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tihomir Slavkov [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Jeff, please find attached 2 snippets - ok.txt is from when there was no error in application and problem.txt is from where there was an "not found" error in our application. The logs are rolling pretty fast, so i hope i was able to capture what is needed. The db collection in problem is "messageThread" | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
I think -vv will do it, but if not, keep turning it up by oner until you see them. | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Solomon Wu [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Can you confirm in the mongod server logs that the query from the worker node is logged after the getlasterror command from the appserver node? The only way I can see this happening is if the getlasterror does not complete first or is not happening at all, setting up this race condition. | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tihomir Slavkov [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Jeff, Yes, we see the issue on various testbeds, the only difference being the frequency. For example, on our automation tests environment it is reproducible >90% of the times, and on production environment it happens <0.1% of the times. The code which reads from MongoDB from the second node looks like: DBObject mtDBDbObject = new BasicDBObject(); DBCursor mtDbCursor = <DBCollection>.find(mtDBDbObject); while (mtDbCursor.hasNext()) { DBObject object = mtDbCursor.next(); // do something with }The same code when invoked on first node right after the write code execution, works correctly 100% of the times. I didn't have time today to test scenario with read from different thread on the same node. Generally, our observation is that there is a small period of time T in order for the second node to be able to retrieve correctly what the first node was written. If the read is attempted before that period T, then it fails; if it is attempted later than that period T, then it succeeds. | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Are you seeing the problem both in the testbed and in production? | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Solomon Wu [ 23/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Tihomir, Do you want to comment on some of these questions? Hi Jeff, DBCollection messageThread = // get collection. Where mtMetaDataDBObject is a BasicDBObject. We can probably provide the code we use for inserting and reading the Two more things I may forget to mention:
| |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 22/Aug/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Your understanding is correct. What does the insert/update from appserver node look like and what does the query from worker node look like? Does the exact same query work from appserver node? What if the query is done on a different thread than the update? |