Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17303

concurrent findAndModify ops with upsert: true can cause a fatal logOp() rollback

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-rc8
    • Fix Version/s: 3.0.0-rc9, 3.1.0
    • Component/s: Concurrency, Write Ops
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      On that box:

      $ cd /home/buzz
      $ sh runt stateEvent --port 37017 -d test5 --numevents 2000000 --posn 100000 --threads 32 --drop --payload 8192

      It will fail 90% of the time somewhere after 3000 and 7000 turns of the crank.

      Show
      On that box: $ cd /home/buzz $ sh runt stateEvent --port 37017 -d test5 --numevents 2000000 --posn 100000 --threads 32 --drop --payload 8192 It will fail 90% of the time somewhere after 3000 and 7000 turns of the crank.

      Description

      This is the big IBM X6 box. High perf SSDs on /data/[1-4]:

      [buzz@IDF-IBM-Test-1 ~]$ uname -a
      Linux IDF-IBM-Test-1.10gen.cc 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
      [buzz@IDF-IBM-Test-1 ~]$ cat /etc/redhat-release 
      CentOS release 6.6 (Final)
      [buzz@IDF-IBM-Test-1 ~]$ df
      Filesystem           1K-blocks     Used Available Use% Mounted on
      /dev/mapper/vg_idfibmtest1-lv_root
                            51606140 11926496  37058204  25% /
      tmpfs                396812172        0 396812172   0% /dev/shm
      /dev/sda1               495844    96335    373909  21% /boot
      /dev/mapper/vg_idfibmtest1-lv_home
                           231167100   832892 218591592   1% /home
      /dev/md1             768924576 20997468 708868020   3% /data/1
      /dev/md2             768924576 10687204 719178284   2% /data/2
      /dev/md3             768924576   201440 729664048   1% /data/3
      /dev/md4             768924576  3242900 726622588   1% /data/4

      Using rc8 with WiredTiger. No special startup options:

      numactl --interleave=all /home/buzz/3.0.0-rc8/bin/mongod --storageEngine=wiredTiger --port 37017 --dbpath /data/4/data0/db0 --logpath /tmp/mongo0.log --fork

      Test program starts 32 threads. Each thread randomly looks for a "position" Pn where 0 <= n < 10000, e.g. P433 in the currentPos collection. findAndModify is used to logically reserve the item. A small event record is inserted to the events collection, the fetched item is "copied" to the historicPos collection, and then currentPos is findAndModify()d with updated info. The find-insert-insert-update sequence we'll call a turn of the crank.

      Trouble starts

        Attachments

        1. mongo0.log
          1.72 MB
        2. rc9.log
          357 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: