Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17303

concurrent findAndModify ops with upsert: true can cause a fatal logOp() rollback

    • Fully Compatible
    • ALL
    • Hide

      On that box:

      $ cd /home/buzz
      $ sh runt stateEvent --port 37017 -d test5 --numevents 2000000 --posn 100000 --threads 32 --drop --payload 8192
      

      It will fail 90% of the time somewhere after 3000 and 7000 turns of the crank.

      Show
      On that box: $ cd /home/buzz $ sh runt stateEvent --port 37017 -d test5 --numevents 2000000 --posn 100000 --threads 32 --drop --payload 8192 It will fail 90% of the time somewhere after 3000 and 7000 turns of the crank.

      This is the big IBM X6 box. High perf SSDs on /data/[1-4]:

      [buzz@IDF-IBM-Test-1 ~]$ uname -a
      Linux IDF-IBM-Test-1.10gen.cc 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
      [buzz@IDF-IBM-Test-1 ~]$ cat /etc/redhat-release 
      CentOS release 6.6 (Final)
      [buzz@IDF-IBM-Test-1 ~]$ df
      Filesystem           1K-blocks     Used Available Use% Mounted on
      /dev/mapper/vg_idfibmtest1-lv_root
                            51606140 11926496  37058204  25% /
      tmpfs                396812172        0 396812172   0% /dev/shm
      /dev/sda1               495844    96335    373909  21% /boot
      /dev/mapper/vg_idfibmtest1-lv_home
                           231167100   832892 218591592   1% /home
      /dev/md1             768924576 20997468 708868020   3% /data/1
      /dev/md2             768924576 10687204 719178284   2% /data/2
      /dev/md3             768924576   201440 729664048   1% /data/3
      /dev/md4             768924576  3242900 726622588   1% /data/4
      

      Using rc8 with WiredTiger. No special startup options:

      numactl --interleave=all /home/buzz/3.0.0-rc8/bin/mongod --storageEngine=wiredTiger --port 37017 --dbpath /data/4/data0/db0 --logpath /tmp/mongo0.log --fork
      

      Test program starts 32 threads. Each thread randomly looks for a "position" Pn where 0 <= n < 10000, e.g. P433 in the currentPos collection. findAndModify is used to logically reserve the item. A small event record is inserted to the events collection, the fetched item is "copied" to the historicPos collection, and then currentPos is findAndModify()d with updated info. The find-insert-insert-update sequence we'll call a turn of the crank.

      Trouble starts

        1. rc9.log
          357 kB
        2. mongo0.log
          1.72 MB

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            buzz.moschetti Buzz Moschetti
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: