[CDRIVER-4231] Duplicates in ObjectID when inserting in parallel Created: 17/Nov/21 Updated: 25/Jan/24 Resolved: 25/Jan/22 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 1.21.0 |
| Type: | Task | Priority: | Unknown |
| Reporter: | Seamus Riley | Assignee: | Colby Pike |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
Prior to driver version 1.14, ObjectID included 3 bytes identifying the machine and 2 bytes denoting PID. This made duplicate ID's impossible. Following 1.14 (and Error from Mongo database on operation 'bulk operation execute': E11000 duplicate key error collection: gfxdb.Rpt_AggregatedExposure_HU index: id dup key: { : ObjectId('60ab881bb7b6e778e76e34a2') } The new algorithm includes a 5-byte random value generated once per process. This random value is meant to be unique to the machine and process. My testing shows that different PIDs are capable of creating the same 5-byte random value. An additional mechanism for avoiding duplicates is that the 3-byte counter at the end of the ObjectID which starts at a random number. In the cases of duplicate 5-byte situation, the counter number starts at the same number. To demonstrate this behavior, I wrote a program that loads to a Mongo collection with 1 document per process, and ran it 5000 times. Each ObjectID is then parsed into it's constituent parts. Here I show those parts for a number of 5-byte random number duplicates:
Testing against a 1.6 C Driver, I see the 5-bytes (3 for machine and 2 for PID) identical in the case of processes which happen to have identical PIDs (running at different times), but in those cases the counter bytes are randomly distributed, e.g.:
Given there are 1/1 trillion odds of two 5-byte random numbers colliding, combined with the odds of the 1/8 million counter numbers colliding (all in the same second), if the code matched the spec, the modern algorithm would not be a problem. |
| Comments |
| Comment by Colby Pike [ 25/Jan/22 ] |
|
A new method of generating unique psuedorandomness has been introduced in OID generation. |
| Comment by Githook User [ 25/Jan/22 ] |
|
Author: {'name': 'vector-of-bool', 'email': 'vectorofbool@gmail.com', 'username': 'vector-of-bool'}Message:
|
| Comment by Seamus Riley [ 22/Nov/21 ] |
|
I saw this behavior with 1.14.0 C driver on RHEL release 6.9 although I have heard reports the behavior is consistent in more recent versions (including 1.17). In my testing the C compiler identification is GNU 4.9.4. Output of cmake (I shortened some of the paths with $BASE_PATH for readability): /ab/tools/bin/cmake . -G"Unix Makefiles" --no-warn-unused-cli -DCMAKE_MODULE_PATH=$BASE_PATH/build/cmake-modules -DCMAKE_PROGRAM_PATH=$BASE_PATH/bin -DCMAKE_LIBRARY_PATH=$BASE_PATH/lib -DCMAKE_INCLUDE_PATH=$BASE_PATH/include -DCMAKE_INSTALL_PREFIX=$BASE_PATH -DCMAKE_INSTALL_LIBDIR=$BASE_PATH/lib -DCMAKE_INSTALL_INCLUDEDIR=$BASE_PATH/include -DCMAKE_INSTALL_OLDINCLUDEDIR=$BASE_PATH/include -DCMAKE_INSTALL_SYSCONFDIR=$BASE_PATH/etc -DCMAKE_C_FLAGS="-m64 -DXXI_MODELBITS=64 -DXXI_LINUX=1 -DXXI_X86=1 -DXXI_LINUX_VARIANT="REDHAT_7_7" -Werror=return-type -Werror=uninitialized -Werror=implicit-function-declaration -Werror=maybe-uninitialized" -DCMAKE_CXX_FLAGS="-m64 -DXXI_MODELBITS=64 -DXXI_LINUX=1 -DXXI_X86=1 -DXXI_LINUX_VARIANT="REDHAT_7_7" -Werror=return-type -Werror=uninitialized -Werror=maybe-uninitialized" -DCMAKE_C_FLAGS_DEBUG="-O0 -g2" -DCMAKE_C_FLAGS_RELEASE="-O2" -DCMAKE_C_FLAGS_RELWITHDEBINFO="-O2 -g2" -DCMAKE_CXX_FLAGS_DEBUG="-O0 -g2" -DCMAKE_CXX_FLAGS_RELEASE="-O2" -DCMAKE_CXX_FLAGS_RELWITHDEBINFO="-O2 -g2" -DCMAKE_BUILD_TYPE=DEBUG -DENABLE_AUTOMATIC_INIT_AND_CLEANUP=OFF -DCMAKE_BUILD_TYPE=Release -DENABLE_SSL=OPENSSL -DENABLE_SASL=OFF -DENABLE_ICU=OFF -DENABLE_SNAPPY=OFF -DENABLE_ZLIB=BUNDLED -DENABLE_SRV=ON -DENABLE_BSON=ON -DENABLE_STATIC=OFF -DENABLE_TESTS=OFF -DENABLE_EXAMPLES=OFF -DOPENSSL_ROOT_DIR=../openssl-1.1.1k -DOPENSSL_LIBRARIES=../lib |
| Comment by Kevin Albertson [ 18/Nov/21 ] |
|
seamus.riley@gmail.com, thank you for the report! We will look into this soon and attempt to reproduce. If possible, can you provide the following information? That may aid the effort of reproducing on an equivalent environment:
|