[SERVER-45511] Data loss following machine PowerOff with writeConcernMajorityJournalDefault true Created: 12/Jan/20  Updated: 15/Jan/20  Resolved: 15/Jan/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark Berg Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-45509 Losing data with writeConcernMajority... Closed
is duplicated by SERVER-45510 Losing data with writeConcernMajority... Closed
Operating System: ALL
Participants:

 Description   

Background:

  • I'm using mongodb 4.2.0 
  • I have deployed a mongo cluster which contain 5 configs, 3 querys and 3 shards. Each shard consist of 4 replicas and 1 arbiter.
  • All members are set on VMs.
  • ReplicaSets writeConcernMajorityJournalDefault flag is true.

The Test:

Before implementing the cluster on the production environment, I've conducted several "stress tests". I have created a simple script that performs many inserts to the cluster and returns the amount of successful inserts.

When I run and stop the script everything is just fine. The number of inserts_count is identical to the count of documents in the collection.

BUT, When I run the script and then PowerOff the Primary member, I'm facing a hitch. My script's insert_count is bigger (10-20) than the count of documents in my collection. I assume that I'm losing data.

I got successful insert acknowledge even though my replicaSet is set writeConcernMajorityJournalDefault true.

Raising the primary doesn't help to retrieve the lost data.

I think the data was still in memory!

Conclusion:

I believe that there is some malfunction with the journaling setting.

P.S:

I tried to insert with _

{w: majority, j: true, wtimeout: 5000}

_ parameters.
Same results

Regards,
Mark Berg



 Comments   
Comment by Danny Hatcher (Inactive) [ 15/Jan/20 ]

I'm glad you were to able to discover the problem. I'll close this ticket.

Comment by Mark Berg [ 15/Jan/20 ]

Issue solved!
I tried to loop inserts into my db directly from the mongos shell. Everything was just fine.
Tried to check the issue again with my python script I released that the write_concern does not work at all.

A bit Google and found this page https://api.mongodb.com/python/current/migrate-to-pymongo3.html#the-write-concern-attribute-is-immutable.

Underline: I have used Pymongo 3.4.0 while trying to run write_concern like old versions.

Comment by Danny Hatcher (Inactive) [ 13/Jan/20 ]

While it is possible that there is a bug, the example you described is a very common use case.

Do the inserts you are performing have a monotonically increasing field? That is, do they have a field that increases by 1 for each individual insert? If so, you should be able to tell if there are any gaps in the expected result set actually on the nodes.

Are you inserting across the shards or are all the inserts going to one shard? Do different nodes in a given shard have a different document count?

If you can provide the full script you are running along with the results I can take a look.

Generated at Thu Feb 08 05:09:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.