Functional Programming and Collective Intelligence - III

Previously, I went over some formulas on how we can calculate the similarity between our items using the Manhattan Distance, Chebyshev Distance and the Jaccard Coefficient.  Then we were able to plug in our given similarity function to another function which then gets the top matches for our given preferences.  This time, instead of determining which critic is more suited to my tastes, instead, we could instead rate which items should be viewed.

Before we move on, let’s get caught up to where we are today:

What Shall I Watch Next?

In our previous attempts, we’ve calculated how similar the critics are to each other, but let’s actually do something a bit more useful to the user, which is to recommend what they should see.  How do we do this?  Like the book states, we could just use the totals, but of course those with the most hits will get an unfair advantage.  Instead, we need to divide the totals by the sum of all similarities to give a more weighted calculation.  To calculate, we can implement the get recommendations which takes the preferences, the person in question and then a similarity function that we’ve defined in the past two posts in this series.

 

// Gets recommendations for a person by using a weighted average
let getRecommendations
  (prefs:Map<string, Map<string, float>>)
   person
  (similarity:Map<string, Map<string, float>> -> string -> string -> float) =
  
  let totals, simSums = 
    prefs
    // If not myself
    |> Map.filter (fun other _ -> person <> other)
    |> Map.fold_left
         (fun acc other items -> 
            let sim = similarity prefs person other
            items
            //  ignore scores of zero or lower and ones I haven't rated
            |> Map.filter(fun item _ -> sim > 0. && 
                                        (not (prefs.[person].ContainsKey item) || 
                                         prefs.[person].[item] = 0.))
            |> Map.fold_left (fun acc' item v  -> 
                 // Similarity * Score
                 let totals = Map.insertWith (+) item (v * sim) (fst acc')
                 // Sum of similarities
                 let simSums = Map.insertWith (+) item sim (snd acc')
                 (totals, simSums)) acc) (Map.empty, Map.empty)
  
  // Create the normalized list                                                               
  seq { for kvp in totals -> (kvp.Value / simSums.[kvp.Key], kvp.Key)}
    |> Seq.sort_by fst
    |> Seq.rev

Now that this is defined, we can now calculate what items we should see based upon our preferences and the ratings of other users.  Let’s use the Manhattan Distance, Pearson Coefficient and Euclidean distance to determine which movies that I should see:

> getRecommendations critics "Matt" sim_distance;;
val it : seq<float * string>
= seq
    [(3.368738892, "The Night Listener"); (2.778584004, "Lady in the Water");
     (2.160784831, "Just My Luck")]
> getRecommendations critics "Matt" sim_manhattan;;
val it : seq<float * string>
= seq
    [(3.402873644, "The Night Listener"); (2.766828171, "Lady in the Water");
     (2.158576052, "Just My Luck")]
> getRecommendations critics "Matt" sim_pearson;;
val it : seq<float * string>
= seq
    [(3.231859684, "The Night Listener"); (2.832549918, "Lady in the Water");
     (2.2509485, "Just My Luck")]     

Based upon this, it’s fairly consistent on which I should see next.  And there you have it, a recommendation system that is complete to use on any type of preferences.  For example, we could use this to determine MP3 recommendations, an Amazon.com recommended items list, and so on.  But is that all? 

Conclusion

In the next part in this series, we’ll look at using the SlopeOne algorithm to determine which can combine all the user opinions such as purchase statistics to recommend items to users.  We’ll also cover how to determine how similar items are to each other as well.  I hope to get some of this code posted before too long on GitHub once I can find the time.

1 Comment

Comments have been disabled for this content.