Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3017

Hazard pointer race with page replace causes error

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.4.0-rc4, 3.2.12
    • Labels:
      None
    • # Replies:
      8
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-11-21

      Description

      Setting a hazard pointer consists of:

      1. reading ref->page and setting hp->page
      2. full barrier
      3. reading ref->page checking it is still equal to hp->page
      4. checking ref->state == WT_REF_MEM.

      However, if eviction cannot evict all of the changes on a page and decides to replace the page, there is a window where ref->page == NULL. If the page replacement completes in between steps 3 and 4 above, the call to __wt_hazard_set can report success but publish a NULL hazard pointer.

        Activity

        Hide
        keith.bostic Keith Bostic added a comment -

        Michael Cahill, nice catch!

        Just to confirm, we're talking about this code setting WT_REF.page to NULL as part of discarding the original page?

        /*
         * Discard the original page.
         *
         * Pages with unresolved changes are not marked clean during
         * reconciliation, do it now.
         */
        __wt_page_modify_clear(session, page);
        __wt_ref_out(session, ref);
                
        /* Swap the new page into place. */
        ref->page = new->page;
        WT_PUBLISH(ref->state, WT_REF_MEM);
        

        Show
        keith.bostic Keith Bostic added a comment - Michael Cahill , nice catch! Just to confirm, we're talking about this code setting WT_REF.page to NULL as part of discarding the original page? /* * Discard the original page. * * Pages with unresolved changes are not marked clean during * reconciliation, do it now. */ __wt_page_modify_clear(session, page); __wt_ref_out(session, ref); /* Swap the new page into place. */ ref->page = new->page; WT_PUBLISH(ref->state, WT_REF_MEM);
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3017 Don't set NULL hazard pointers. (#3141)
        Branch: develop
        https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3017 Don't set NULL hazard pointers. (#3141) Branch: develop https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6
        Hide
        michael.cahill Michael Cahill added a comment -

        Keith Bostic, that is the case we saw in testing, but any change to ref->page could potentially race with the original __wt_hazard_set code.

        Show
        michael.cahill Michael Cahill added a comment - Keith Bostic , that is the case we saw in testing, but any change to ref->page could potentially race with the original __wt_hazard_set code.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3017 Don't set NULL hazard pointers. (#3141)
        Branch: mongodb-3.4
        https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3017 Don't set NULL hazard pointers. (#3141) Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3017 Don't set NULL hazard pointers. (#3141)
        Branch: mongodb-3.2
        https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3017 Don't set NULL hazard pointers. (#3141) Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4

        ref: 74430da40c..ca6eee06ff
        for: 3.4.0

        WT-2962 Provide a way to configure builtin extensions
        WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
        WT-3000 Missing log records in recovery when crashing after a log file switch
        WT-3002 Allow applications to exempt threads from eviction.
        WT-3004 lint: declare functions that don't return a value as void
        WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
        WT-3012 Test format hanging on LSM configurations
        WT-3015 Test format stuck with 2mb cache
        WT-3016 Tests needed for systems without ftruncate
        WT-3017 Hazard pointer race with page replace causes error
        WT-3018 lint
        WT-3020 LSM primary changes impact parallel-pop-lsm load time
        WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
        WT-3023 Test format hang on zSeries
        WT-3024 wtperf medium-lsm-compact test can hang
        Branch: master
        https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: ca6eee06ffdacc8e191987e64b3791740dad21e1 from branch mongodb-3.4 ref: 74430da40c..ca6eee06ff for: 3.4.0 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: master https://github.com/mongodb/mongo/commit/fb4ae3792065e98696e391ac1c4602216b8502cb
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: WT-3017 Don't set NULL hazard pointers. (#3141)
        Branch: mongodb-3.2
        https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-3017 Don't set NULL hazard pointers. (#3141) Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/370154159bc7e3b06488cbdf7deb0585baf841b6
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

        Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2

        ref: 187707a5c1..040e3d6f76
        for: 3.2.12

        WT-2962 Provide a way to configure builtin extensions
        WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND
        WT-3000 Missing log records in recovery when crashing after a log file switch
        WT-3002 Allow applications to exempt threads from eviction.
        WT-3004 lint: declare functions that don't return a value as void
        WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor.
        WT-3012 Test format hanging on LSM configurations
        WT-3015 Test format stuck with 2mb cache
        WT-3016 Tests needed for systems without ftruncate
        WT-3017 Hazard pointer race with page replace causes error
        WT-3018 lint
        WT-3020 LSM primary changes impact parallel-pop-lsm load time
        WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete
        WT-3023 Test format hang on zSeries
        WT-3024 wtperf medium-lsm-compact test can hang
        Branch: v3.2
        https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Import wiredtiger: 040e3d6f764c0fb626cb47fede54469f57d0c6e0 from branch mongodb-3.2 ref: 187707a5c1..040e3d6f76 for: 3.2.12 WT-2962 Provide a way to configure builtin extensions WT-2984 Search of metadata for recently created collection gets WT_NOTFOUND WT-3000 Missing log records in recovery when crashing after a log file switch WT-3002 Allow applications to exempt threads from eviction. WT-3004 lint: declare functions that don't return a value as void WT-3011 __wt_curjoin_open() saves the wrong URI in the cursor. WT-3012 Test format hanging on LSM configurations WT-3015 Test format stuck with 2mb cache WT-3016 Tests needed for systems without ftruncate WT-3017 Hazard pointer race with page replace causes error WT-3018 lint WT-3020 LSM primary changes impact parallel-pop-lsm load time WT-3022 LSM operations get stuck in __wt_clsm_await_switch waiting for switch on tree to complete WT-3023 Test format hang on zSeries WT-3024 wtperf medium-lsm-compact test can hang Branch: v3.2 https://github.com/mongodb/mongo/commit/c586934f7212f6a9a2087cbaf9a8fcd7d7ce9abf

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              24 weeks, 1 day ago
              Date of 1st Reply:

                Agile