Suppose you have a 2 shards db and a sharded collection with a shard key 'origin'. 2.6 M documents.
a sh.status() gives:
So, sorted by 'origin', half the records are in one shard the the next half is in the other shard.
Now suppose we want to get all the records, sorted by 'origin' and every 1000 records, make a process that takes 1 seconds... Getting half of the records takes more than 10mn (the default timeout for cursors).
If we don't specify a batch_size, we don't even arrive to 1.182 records.
If we specify a batch size of 1000, we get a "RecordNotFound" error when, after having returned half the records + the records from the first get from the 2nd shard, mongos tries to get more data from the 2nd shard.
That's what we wanted to show up because whatever the batch_size you specify, you won't be able to get all the records. Using a small batch size is often described as the way to solve cursors timeout. In a multi sharded db it is not the case.
Of course, this example has been crafted to reproduce the issue. It is not a real use case but I have real ones where I have 10 shards, process of each record takes more than 1 second and cursorTimeoutMillis: "7200000" (2 hours) and still get CursorNotFound errors.
no_cursor_timeout=True is not an option because it can lead to ghost cursors in the db