Uploaded image for project: 'Spark Connector'
  1. Spark Connector
  2. SPARK-104

Spark Connector 2.00 (scala 2.11) unexplained issue

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      1. The easily explained one. I got "error 'Cannot infer type for class org.freemind.spark.sql.MongoRating because it is not bean-compliant'" when I use
      MongoSpark.load(sparkSession, readConfig, classOf[MongoRating])
      I can still use MongoSpark.load(sparkSession, mrReadConfig).as[MongoRating]. However, it does not look like It really cast to MongoRating. ds.show() display ObjectId but MongoRating does not include that field.
      2. Original idea came from Sam Weaver's article https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/346304/2168141618055109/484361/latest.html. I use movielens 1 milllion set and revised and wrote my MovieLensALS which works fine. I wrote another one: MovieLensALSMongo which have the same codes except for data source change plus minor field change (movie_id instead of movieId etc.) . I mongoimport movie_ratings, personal_ratings and movies. Recommendation results in MovieLensALSMongo is very very different and quite off (prediction is as high as 10+).
      I tracked down and I couldn't believed what I found. I used Dataset randomSplit. Sum up of all splits should be equals to the Dataset count. That is true for MovieLensALS of all runs but not true for MovieLensALSMongo. The total for movie_rating is 1M+209. The splits total for MovieLensALSMongo are all over the places: 1M+522, 1M+355, 1M+172,!M+239,1M+295. I added a line of code to print the sum-up. That does not even match what the calculator says.

      I really cannot explain. To prove I am not out of my mind, I provide those Scala files as well as my person ratings one. That's the most buzzard one in my programming career. I like to know why. Thanks.

        1. MovieLensALS.scala
          7 kB
          Sonya Ling
        2. MovieLensALSMongo.scala
          7 kB
          Sonya Ling
        3. personalRatings.txt
          0.5 kB
          Sonya Ling

            Assignee:
            ross@mongodb.com Ross Lawley
            Reporter:
            threecuptea Sonya Ling
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: