Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29446

$sample stage could not find a non-duplicate document while using a random cursor

    • Storage Engines
    • ALL
    • v4.2
    • Hide

      Taken from Jesse's comment on SERVER-20385.

      With MongoDB 3.4.4 on Mac OS X, I can reproduce this. First do "python -m pip install pymongo pytz", then:

      from datetime import datetime, timedelta
       
      import pytz
      from bson import ObjectId
      from pymongo import MongoClient
      from pymongo.errors import OperationFailure
       
      CHUNKS = 20
       
      collection = MongoClient().db.test
      collection.delete_many({})
       
      start = datetime(2000, 1, 1, tzinfo=pytz.UTC)
      for hour in range(10000):
          collection.insert(
              {'_id': ObjectId.from_datetime(start + timedelta(hours=hour)), 'x': 1})
       
      for _ in range(10):
          try:
              docs = list(collection.aggregate([{
                  "$sample": {"size": CHUNKS}
              }, {
                  "$sort": {"_id": 1}
              }]))
          except OperationFailure as exc:
              if exc.code == 28799:
                  # Work around https://jira.mongodb.org/browse/SERVER-20385
                  print("retry")
                  continue
       
              raise
       
          for d in docs:
              print(d['_id'].generation_time)
       
          break
      else:
          raise OperationFailure("$sample failed")
      

      As often as not, the sample fails ten times in a row with error code 28799 and the message: "$sample stage could not find a non-duplicate document after 100 while using a random cursor. This is likely a sporadic failure, please try again."

      Show
      Taken from Jesse's comment on SERVER-20385 . With MongoDB 3.4.4 on Mac OS X, I can reproduce this. First do "python -m pip install pymongo pytz", then: from datetime import datetime, timedelta import pytz from bson import ObjectId from pymongo import MongoClient from pymongo.errors import OperationFailure CHUNKS = 20 collection = MongoClient().db.test collection.delete_many({}) start = datetime(2000, 1, 1, tzinfo=pytz.UTC) for hour in range (10000): collection.insert( { '_id' : ObjectId.from_datetime(start + timedelta(hours=hour)), 'x' : 1}) for _ in range (10): try : docs = list (collection.aggregate([{ "$sample" : { "size" : CHUNKS} }, { "$sort" : { "_id" : 1} }])) except OperationFailure as exc: if exc.code == 28799: # Work around https://jira.mongodb.org/browse/SERVER-20385 print ( "retry" ) continue raise for d in docs: print (d[ '_id' ].generation_time) break else : raise OperationFailure( "$sample failed" ) As often as not, the sample fails ten times in a row with error code 28799 and the message: "$sample stage could not find a non-duplicate document after 100 while using a random cursor. This is likely a sporadic failure, please try again."
    • Storage 2017-07-31, Storage Engines 2019-08-12, Storage Engines 2019-08-26, StorEng - 2023-08-08
    • 19

      The error originally reported in SERVER-20385 is marked as fixed in 3.1.9, but we are still able to reproduce it against version 3.4.4

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            charlie.swanson@mongodb.com Charlie Swanson
            Votes:
            2 Vote for this issue
            Watchers:
            30 Start watching this issue

              Created:
              Updated:
              Resolved: