Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29446

$sample stage could not find a non-duplicate document while using a random cursor

    XMLWordPrintable

    Details

    • Operating System:
      ALL
    • Backport Requested:
      v4.2
    • Steps To Reproduce:
      Hide

      Taken from Jesse's comment on SERVER-20385.

      With MongoDB 3.4.4 on Mac OS X, I can reproduce this. First do "python -m pip install pymongo pytz", then:

      from datetime import datetime, timedelta
       
      import pytz
      from bson import ObjectId
      from pymongo import MongoClient
      from pymongo.errors import OperationFailure
       
      CHUNKS = 20
       
      collection = MongoClient().db.test
      collection.delete_many({})
       
      start = datetime(2000, 1, 1, tzinfo=pytz.UTC)
      for hour in range(10000):
          collection.insert(
              {'_id': ObjectId.from_datetime(start + timedelta(hours=hour)), 'x': 1})
       
      for _ in range(10):
          try:
              docs = list(collection.aggregate([{
                  "$sample": {"size": CHUNKS}
              }, {
                  "$sort": {"_id": 1}
              }]))
          except OperationFailure as exc:
              if exc.code == 28799:
                  # Work around https://jira.mongodb.org/browse/SERVER-20385
                  print("retry")
                  continue
       
              raise
       
          for d in docs:
              print(d['_id'].generation_time)
       
          break
      else:
          raise OperationFailure("$sample failed")
      

      As often as not, the sample fails ten times in a row with error code 28799 and the message: "$sample stage could not find a non-duplicate document after 100 while using a random cursor. This is likely a sporadic failure, please try again."

      Show
      Taken from Jesse's comment on SERVER-20385 . With MongoDB 3.4.4 on Mac OS X, I can reproduce this. First do "python -m pip install pymongo pytz", then: from datetime import datetime, timedelta import pytz from bson import ObjectId from pymongo import MongoClient from pymongo.errors import OperationFailure CHUNKS = 20 collection = MongoClient().db.test collection.delete_many({}) start = datetime( 2000 , 1 , 1 , tzinfo = pytz.UTC) for hour in range ( 10000 ): collection.insert( { '_id' : ObjectId.from_datetime(start + timedelta(hours = hour)), 'x' : 1 }) for _ in range ( 10 ): try : docs = list (collection.aggregate([{ "$sample" : { "size" : CHUNKS} }, { "$sort" : { "_id" : 1 } }])) except OperationFailure as exc: if exc.code = = 28799 : # Work around https://jira.mongodb.org/browse/SERVER-20385 print ( "retry" ) continue raise for d in docs: print (d[ '_id' ].generation_time) break else : raise OperationFailure( "$sample failed" ) As often as not, the sample fails ten times in a row with error code 28799 and the message: "$sample stage could not find a non-duplicate document after 100 while using a random cursor. This is likely a sporadic failure, please try again."
    • Sprint:
      Storage 2017-07-31, Storage Engines 2019-08-12, Storage Engines 2019-08-26, Storage - Ra 2022-01-24
    • Case:
    • Linked BF Score:
      19
    • Story Points:
      8

      Description

      The error originally reported in SERVER-20385 is marked as fixed in 3.1.9, but we are still able to reproduce it against version 3.4.4

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sulabh.mahajan Sulabh Mahajan
              Reporter:
              charlie.swanson Charlie Swanson
              Participants:
              Votes:
              2 Vote for this issue
              Watchers:
              26 Start watching this issue

                Dates

                Created:
                Updated: