Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 0.42
    • Fix Version/s: 0.701.4
    • Component/s: Perl driver
    • Labels:
      None
    • Environment:
      debian on a xeon E5405 (8 core) with 16G of memory and SSD disk
    • # Replies:
      10
    • Last comment by Customer:
      true

      Description

      Perl driver is very, very, very slow compared to PHP one...

      Doing some tests with perl 5.10.1 and perl 5.12.3 show that PHP is between 2.8 to more than 3.2 times faster for basic insertions (without or with threads enabled).

      In PHP case, mongod reach 100% and PHP stay around 57% CPU. 10M insertions done in ~100 seconds.

      In Perl case, mongodb stay around 76% while Perl is at 100% CPU ! 10M insertions done in ~280 seconds (threaded perl) or ~320 seconds (original, debian threaded perl).

      I join the 2 sample scripts and a little patch that avoid calling sprintf() 12 times for each insertion. In the test case, 10% of exec time was spent in sprintf. With it we gain 10% of total time, not much, but better than nothing...

      1. insert.php
        0.4 kB
        Maxime Soulé
      2. insert.pl
        0.6 kB
        Maxime Soulé
      3. without-sprintf.patch
        0.7 kB
        Maxime Soulé

        Activity

        Hide
        tdmdfever Feng Deng added a comment -

        I recently updated the MongoDB driver from 0.702.2 to 0.704.2.0. In the latest version, I noticed quite a bit slow down for the same query that I am running. Below are the results of average of 50 run against two different versions. Any suggestions? Thanks

        Query:
        $collection->find({
        'run date' =>

        { '$gt' => 20130928, '$lt' => 20140307' }

        })'

        Perl: 5.18.1

        'run date' is indexed but non-uniq
        #number of records: 199,955
        In 0.702.2 --> it takes 5.465 sec
        in 0.704.2 --> it takes 9.324 sec

        Show
        tdmdfever Feng Deng added a comment - I recently updated the MongoDB driver from 0.702.2 to 0.704.2.0. In the latest version, I noticed quite a bit slow down for the same query that I am running. Below are the results of average of 50 run against two different versions. Any suggestions? Thanks Query: $collection->find({ 'run date' => { '$gt' => 20130928, '$lt' => 20140307' } })' Perl: 5.18.1 'run date' is indexed but non-uniq #number of records: 199,955 In 0.702.2 --> it takes 5.465 sec in 0.704.2 --> it takes 9.324 sec
        Hide
        david.golden David Golden added a comment -

        BSON serialization changed to use libbson between the 0.702.X series and 0.704.X series (along with many other changes).

        We have plans to examine BSON serialization/deserialization again and will do some benchmarking and profiling to see what can be done to improve performance.

        For reference, can you tell me more about the nature of the documents in the collection? Or can you post a sanitized example document?

        Show
        david.golden David Golden added a comment - BSON serialization changed to use libbson between the 0.702.X series and 0.704.X series (along with many other changes). We have plans to examine BSON serialization/deserialization again and will do some benchmarking and profiling to see what can be done to improve performance. For reference, can you tell me more about the nature of the documents in the collection? Or can you post a sanitized example document?
        Hide
        zeptomax Maxime Soulé added a comment -

        Please see the files attached to this ticket.

        Show
        zeptomax Maxime Soulé added a comment - Please see the files attached to this ticket.
        Hide
        tdmdfever Feng Deng added a comment -

        Thanks David.
        Below is a representation of the document in my collection. The key* are some short descriptive strings, and the value of string type are most short string with few words. There are 590k documents in this collection and I have an expected incremental size of 40k row. Let me know if helps.

        Hi Maxime-
        I checked the files in this ticket. What am i suppose to do with it? The timestamp of these files shows back in 2011. Thought the relevant changes are in place already. Let me know if i missed anything.

        {
        "_id" : ObjectId("xxxxxxxxxxxxxxxxxxxxxxx"),
        "key 1" : "some string",
        "key 2" : "some string",
        "key 3" : "some string",
        "key 4" : "some string",
        "key 5" : "some string",
        "key 6" : "some string",
        "Key 7" : "some string",
        "key 8" : "some string",
        "key 9" : "some string",
        "key 10" : float,
        "key 11" : "some string",
        "key 12" : "some string",
        "key 13" : float,
        "key 14" : "some string",
        "uuid" : "an universal uniq id"
        }

        Show
        tdmdfever Feng Deng added a comment - Thanks David. Below is a representation of the document in my collection. The key* are some short descriptive strings, and the value of string type are most short string with few words. There are 590k documents in this collection and I have an expected incremental size of 40k row. Let me know if helps. Hi Maxime- I checked the files in this ticket. What am i suppose to do with it? The timestamp of these files shows back in 2011. Thought the relevant changes are in place already. Let me know if i missed anything. { "_id" : ObjectId("xxxxxxxxxxxxxxxxxxxxxxx"), "key 1" : "some string", "key 2" : "some string", "key 3" : "some string", "key 4" : "some string", "key 5" : "some string", "key 6" : "some string", "Key 7" : "some string", "key 8" : "some string", "key 9" : "some string", "key 10" : float, "key 11" : "some string", "key 12" : "some string", "key 13" : float, "key 14" : "some string", "uuid" : "an universal uniq id" }
        Hide
        david.golden David Golden added a comment -

        Thank you, Feng. I can't promise a quick fix, but having a description of your dataset is helpful for your benchmarking.

        Show
        david.golden David Golden added a comment - Thank you, Feng. I can't promise a quick fix, but having a description of your dataset is helpful for your benchmarking.

          People

          • Votes:
            8 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              44 weeks ago
              Date of 1st Reply: