Back to Repositories

Testing User-Based Collaborative Filtering Algorithms in AILearning

This test suite implements and validates user-based collaborative filtering algorithms for recommendation systems. It includes three different similarity calculation methods and a recommendation function that leverages user similarity matrices.

Test Coverage Overview

The test suite covers three user similarity calculation implementations (UserSimilarity1, UserSimilarity2, UserSimilarity3) and a recommendation function. It tests basic similarity calculations, inverse user-item mappings, and weighted recommendations with logarithmic penalties.

  • Tests basic co-rated item similarity
  • Validates inverse user-item table construction
  • Verifies logarithmic penalty calculations
  • Checks recommendation ranking functionality

Implementation Analysis

The testing approach implements matrix-based similarity calculations using different methodologies. It uses dictionary-based data structures for efficient storage and lookup of user-item interactions and similarity scores.

The implementation includes optimization techniques like inverse tables and logarithmic penalties to handle large-scale recommendation scenarios.

Technical Details

Key technical components include:

  • Python math library for mathematical operations
  • Dictionary-based sparse matrix implementation
  • Set operations for efficient intersection calculations
  • itemgetter for sorting recommendations
  • Logarithmic scaling for popularity penalty

Best Practices Demonstrated

The test suite demonstrates several recommendation system best practices:

  • Multiple similarity calculation methods for comparison
  • Memory-efficient sparse matrix implementations
  • Proper handling of edge cases (self-similarity exclusion)
  • Scalable approach using inverse user-item mappings
  • Modular function design for maintainability

apachecn/ailearning

src/py3.x/ml/16.RecommenderSystems/test_基于用户.py

            
import math
from operator import itemgetter

def UserSimilarity1(train):
    W = dict()
    for u in train.keys():
        for v in train.keys():
            if u == v:
                continue
            W[u][v] = len(train[u] & train[v])
            W[u][v] /= math.sqrt(len(train[u]) * len(train[v]) * 1.0)
    return W


def UserSimilarity2(train):
    # build inverse table for item_users
    item_users = dict()
    for u, items in train.items():
        for i in items.keys():
            if i not in item_users:
                item_users[i] = set()
            item_users[i].add(u)

    #calculate co-rated items between users
    C = dict()
    N = dict()
    for i, users in item_users.items():
        for u in users:
            N[u] += 1
            for v in users:
                if u == v:
                    continue
                C[u][v] += 1

    #calculate finial similarity matrix W
    W = dict()
    for u, related_users in C.items():
        for v, cuv in related_users.items():
            W[u][v] = cuv / math.sqrt(N[u] * N[v])
    return W


def UserSimilarity3(train):
    # build inverse table for item_users
    item_users = dict()
    for u, items in train.items():
        for i in items.keys():
            if i not in item_users:
                item_users[i] = set()
            item_users[i].add(u)

    #calculate co-rated items between users
    C = dict()
    N = dict()
    for i, users in item_users.items():
        for u in users:
            N[u] += 1
            for v in users:
                if u == v:
                    continue
                C[u][v] += 1 / math.log(1 + len(users))

    #calculate finial similarity matrix W
    W = dict()
    for u, related_users in C.items():
        for v, cuv in related_users.items():
            W[u][v] = cuv / math.sqrt(N[u] * N[v])
    return W


def Recommend(user, train, W):
    rank = dict()
    interacted_items = train[user]
    for v, wuv in sorted(W[u].items, key=itemgetter(1), reverse=True)[0:K]:
        for i, rvi in train[v].items:
            if i in interacted_items:
                #we should filter items user interacted before
                continue
            rank[i] += wuv * rvi
    return rank