Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-535

posix_madvise returns EBADF

    • Type: Icon: Task Task
    • Resolution: Done
    • WT1.6.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      YCSB is getting the following failure:

      file:usertable-000042.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000412.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000414.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000421.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000425.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000426.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000427.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000429.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000430.lsm: posix_madvise will need: Bad file descriptor
      file:usertable-000433.lsm: posix_madvise will need: Bad file descriptor
      

      We also saw this in Sue's runs of test3, and at that point, Michael speculated:

      > My best guess for the EBADF returns is that we have to map the pointer and size to block boundaries, and that calculation may be overflowing. Some of the LSM files are over 2GB, so that doesn't seem impossible. Apparently madvise can return EBADF if the address range isn't a mapped file... I'll look into that. We don't really care if readahead is unsuccessful, but I suspect this is telling us something real.

      I've been looking at this for a little while, and I don't see anything obvious.

      Michael, we aren't currently aligning the size to anything in particular, maybe it needs to be a multiple of 4KB?

      The Linux kernel has this code in madvise.c, which explains the documentation, but I can't imagine this is what we're hitting.

      /*
       * Schedule all required I/O operations.  Do not wait for completion.
       */
      static long madvise_willneed(struct vm_area_struct * vma,
      			     struct vm_area_struct ** prev,
      			     unsigned long start, unsigned long end)
      {
      	struct file *file = vma->vm_file;
      
      #ifdef CONFIG_SWAP
      	if (!file || mapping_cap_swap_backed(file->f_mapping)) {
      		*prev = vma;
      		if (!file)
      			force_swapin_readahead(vma, start, end);
      		else
      			force_shm_swapin_readahead(vma, start, end,
      						file->f_mapping);
      		return 0;
      	}
      #endif
      
      	if (!file)
      		return -EBADF;
      

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: