Details

    • Type: New Feature New Feature
    • Status: Open Open
    • Priority: Minor - P4 Minor - P4
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: planned but not scheduled
    • Component/s: Storage
    • Labels:
      None
    • # Replies:
      57
    • Last comment by Customer:
      true

      Description

      When storing textual data (and having more CPU than IO capacity) it'd be nice to have an option to have the data stored gzip compressed on disk.

        Issue Links

          Activity

          Hide
          James Blackburn
          added a comment -

          We've got a large MongoDB instance running on top of ZFS on linux using lz4 compression. It seems to work well - at least we haven't had any problems related to the filesystem.

          Show
          James Blackburn
          added a comment - We've got a large MongoDB instance running on top of ZFS on linux using lz4 compression. It seems to work well - at least we haven't had any problems related to the filesystem.
          Hide
          Ben McCann
          added a comment -

          James, did you benchmark performance with compressed ZFS at all?

          Show
          Ben McCann
          added a comment - James, did you benchmark performance with compressed ZFS at all?
          Hide
          James Blackburn
          added a comment - - edited

          We've done some benchmarking, yes.

          We're running zfs on linux 0.6.2 on RHEL6 and this setup is very new here. Throughput, with an I/O bound workload, is near-on identical to a replicaset backed by ext4. Though this should be taken with a pinch of salt as:

          1. the application we're running is fairly data intensive, so we already do in-app compression with lz4
          2. The databases are very new and the traffic is mostly write only

          1) Gives gives us end-to-end I/O saving including network traffic and memory load on the mongodb servers. With 2 it's not clear how, or whether, performance will degrade as the DB ages and ZFS's COW nature causes leads to fragmentation. Given MongoDB's nature is essentially random read I/O anyway, I'm hoping it won't be too bad, but time will tell.

          As we already do compression in the app, ZFS gives us a compression factor of only ~1.1x on these MongoDB databases. For normal databases (e.g. the configdb) and home directories we get a 2x - 10x compression factor.

          Edit: although the setup is new, we've put >8TB of data into it, and soak tested full I/O bound reads for a few days with nothing blowing up.

          Show
          James Blackburn
          added a comment - - edited We've done some benchmarking, yes. We're running zfs on linux 0.6.2 on RHEL6 and this setup is very new here. Throughput, with an I/O bound workload, is near-on identical to a replicaset backed by ext4. Though this should be taken with a pinch of salt as: the application we're running is fairly data intensive, so we already do in-app compression with lz4 The databases are very new and the traffic is mostly write only 1) Gives gives us end-to-end I/O saving including network traffic and memory load on the mongodb servers. With 2 it's not clear how, or whether, performance will degrade as the DB ages and ZFS's COW nature causes leads to fragmentation. Given MongoDB's nature is essentially random read I/O anyway, I'm hoping it won't be too bad, but time will tell. As we already do compression in the app, ZFS gives us a compression factor of only ~1.1x on these MongoDB databases. For normal databases (e.g. the configdb) and home directories we get a 2x - 10x compression factor. Edit : although the setup is new, we've put >8TB of data into it, and soak tested full I/O bound reads for a few days with nothing blowing up.
          Hide
          Ben McCann
          added a comment -

          Thanks for sharing! It's great to see numbers around it. To share some of our benchmarking, we've been testing TokuMX and have found that we get a 5x compression factor and 3x write throughput, which is a pretty awesome win.

          Show
          Ben McCann
          added a comment - Thanks for sharing! It's great to see numbers around it. To share some of our benchmarking, we've been testing TokuMX and have found that we get a 5x compression factor and 3x write throughput, which is a pretty awesome win.
          Hide
          agahd
          added a comment -

          Using TokuMX 1.4 we get 10x compression factor and 5x write throughput.

          Show
          agahd
          added a comment - Using TokuMX 1.4 we get 10x compression factor and 5x write throughput.

            Dates

            • Created:
              Updated:
              Days since reply:
              3 weeks, 2 days ago
              Date of 1st Reply: