[SERVER-53015] Inserted data intermittently not seen in sharded cluster when database was just dropped Created: 21/Nov/20  Updated: 21/Dec/20  Resolved: 21/Dec/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Oleg Pudeyev (Inactive) Assignee: Jaume Moragues (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to RUBY-2447 Fix intermittent transaction examples... Closed
Operating System: ALL
Sprint: Sharding 2020-12-14, Sharding 2020-12-28
Participants:

 Description   

I made the following test program which drops the database, inserts a document and performs countdocuments on the database in question. Eventually this program will see the document count stay at 0 for several seconds after a successful insert, after which the count increases to 1.

To make this situation reproducible, I insert some more data prior to the drop to generate extra load, and I drop on only ~ 10% of the iterations.

require 'mongo'
require 'byebug'
 
Mongo::Logger.logger.level = 1
 
client = Mongo::Client.new(['localhost:14240', 'localhost:14241']).use('xxx')
db1 = client[:db1]
 
loop do
  15.times do
    db1.insert_many([{a: rand}]*5)
  end
 
  check = rand >= 0.9
 
  if check
    db1.database.drop
  end
 
  db1.insert_one(a: 1)
 
  if check
    v = db1.count_documents
    if v != 1
      p v
      #byebug
      1
      loop do
        v = db1.count_documents
        if v != 1
          p v
          sleep 1
        else
          byebug
          1
        end
      end
    end
  end
 
  print('.')
end

Result:

serene% ruby test.rb
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................0
0
0
0
0
 
[30, 39] in /home/w/apps/tests/shard-phantom/test.rb
   30:         if v != 1
   31:           p v
   32:           sleep 1
   33:         else
   34:           byebug
=> 35:           1
   36:         end
   37:       end
   38:     end
   39:   end
(byebug) 

This issue originally appeared in the transactions example tests for the Ruby driver, where instead of doing a count a find was issued, and that find returned no results after the document to be found was inserted.

I tried setting read and write concerns to majority, which didn't fix the problem:

client = Mongo::Client.new(['localhost:14240', 'localhost:14241']).use('xxx').with(
  read_concern: {level: :majority}, write_concern: {w: :majority})

In the Ruby driver test suite this condition persists for much longer than this reproduce case shows (potentially indefinitely). To get out of it I can insert another document into the collection in question. Note in the following example how the count jumps from 0 to 2 even though I only inserted one document:

(byebug) hr[:employees].count
D, [2020-11-21T04:45:16.374462 #427] DEBUG -- : MONGODB | localhost:27018 req:79 conn:1:1 sconn:22 | hr.count | STARTED | {"count"=>"employees", "query"=>{}, "readConcern"=>{"level"=>"majority"}, "$db"=>"hr", "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00007fbe5c090200 @seconds=1605933879, @increment=1>, "signature"=>{"hash"=><BSON::Binary:0x6960 type=generic dat...
D, [2020-11-21T04:45:16.379245 #427] DEBUG -- : MONGODB | localhost:27018 req:79 | hr.count | SUCCEEDED | 0.004s
0
(byebug) hr[:employees].insert_one(a:1)
D, [2020-11-21T04:45:20.533725 #427] DEBUG -- : MONGODB | localhost:27018 req:80 conn:1:1 sconn:22 | hr.insert | STARTED | {"insert"=>"employees", "ordered"=>true, "writeConcern"=>{"w"=>"majority"}, "documents"=>[{"a"=>1, "_id"=>BSON::ObjectId('5fb89b6094fbce01ab1bea0e')}], "$db"=>"hr", "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00007fbe5c0bea88 @seconds=16059339...
D, [2020-11-21T04:45:20.540557 #427] DEBUG -- : MONGODB | localhost:27018 req:80 | hr.insert | SUCCEEDED | 0.007s
#<Mongo::Operation::Insert::Result:0x7120 documents=[{"n"=>1, "ok"=>1.0, "operationTime"=>#<BSON::Timestamp:0x00007fbe5c0c4af0 @seconds=1605933920, @increment=4>, "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00007fbe5c0c4a00 @seconds=1605933920, @increment=4>, "signature"=>{"hash"=><BSON::Binary:0x7140 type=generic data=0x0000000000000000...>, "keyId"=>0}}}]>
(byebug) hr[:employees].count
D, [2020-11-21T04:45:21.510076 #427] DEBUG -- : MONGODB | localhost:27017 req:81 conn:1:1 sconn:28 | hr.count | STARTED | {"count"=>"employees", "query"=>{}, "readConcern"=>{"level"=>"majority"}, "$db"=>"hr", "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00007fbe5c0c4a00 @seconds=1605933920, @increment=4>, "signature"=>{"hash"=><BSON::Binary:0x7140 type=generic dat...
D, [2020-11-21T04:45:21.516207 #427] DEBUG -- : MONGODB | localhost:27017 req:81 | hr.count | SUCCEEDED | 0.005s
2

Note that even though in the above output the counts were sent to different mongoses, prior to insert both mongoses returned 0 and after the insert both returned 2.

This issue seems to only affect 4.2 sharded clusters.



 Comments   
Comment by Oleg Pudeyev (Inactive) [ 16/Dec/20 ]

Reproduced also on 4.2.3 on ubuntu1804 using an official MongoDB server build with the count test. Find test either does not reproduce on this configuration or takes much longer.

I thought the count test used estimated count but it uses countDocuments so it should be as accurate as the find test. I suggest running the count test.

Server launched with:

Running mlaunch --replicaset --nodes 2 --sharded 2 --name ruby-driver-rs --mongos 2 --dir /mnt/zram/mongodb/4.2-scm --setParameter diagnosticDataCollectionEnabled=false --wiredTigerCacheSizeGB 0.25 --setParameter honorSystemUmask=true --wiredTigerEngineConfigString 'log=(prealloc=false,file_max=20MB),cache_size=256M' --networkMessageCompressors zstd,snappy,zlib --setParameter enableTestCommands=1 --filePermissions 0666 --binarypath /usr/local/m/versions/4.2 --port 14240

Test was performed immediately after the launch, there was no other activity happening in this deployment.

Comment by Oleg Pudeyev (Inactive) [ 14/Dec/20 ]

Original test with count: https://github.com/p-mongo/tests/blob/master/shard-phantom/test-count.rb

Revised test using find instead of count: https://github.com/p-mongo/tests/blob/master/shard-phantom/test-find.rb

Fails eventually as well but seemingly takes a bit longer.

Comment by Githook User [ 27/Nov/20 ]

Author:

{'name': 'Oleg Pudeyev', 'email': '39304720+p-mongo@users.noreply.github.com', 'username': 'p-mongo'}

Message: RUBY-2447 Fix intermittent transaction examples test failures (#2135)

  • run tx examples first
  • add diagnostics to transaction aborts
  • try majority read/write concerns

Co-authored-by: Oleg Pudeyev <oleg@bsdpower.com>
Branch: 2.14-stable
https://github.com/mongodb/mongo-ruby-driver/commit/a761c45a027e6156b36405b330c9d74636bf0284

Comment by Githook User [ 27/Nov/20 ]

Author:

{'name': 'Oleg Pudeyev', 'email': '39304720+p-mongo@users.noreply.github.com', 'username': 'p-mongo'}

Message: RUBY-2447 Fix intermittent transaction examples test failures (#2135)

  • run tx examples first
  • add diagnostics to transaction aborts
  • try majority read/write concerns

Co-authored-by: Oleg Pudeyev <oleg@bsdpower.com>
Branch: master
https://github.com/mongodb/mongo-ruby-driver/commit/ae1c05aa2c5355ea9248d3fe587e2cae66d538fd

Generated at Thu Feb 08 05:29:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.