Back to Repositories

Testing Recommender System Evaluation Metrics in AILearning

This test suite implements evaluation metrics for a recommender system, including precision, recall, coverage, and popularity calculations. It provides comprehensive validation of recommendation algorithm performance through data splitting and metric computation.

Test Coverage Overview

The test suite covers four key recommendation system metrics: precision, recall, coverage, and popularity. It includes data splitting functionality for creating training and test sets with configurable random seed parameters.

  • Tests recommendation accuracy via precision and recall metrics
  • Validates system coverage across item space
  • Measures novelty through popularity-based calculations
  • Handles edge cases in data splitting and metric computation

Implementation Analysis

The implementation follows a modular approach with separate functions for each evaluation metric. It utilizes Python’s math and random libraries for calculations and data manipulation.

  • Implements train-test split with configurable parameters
  • Uses dictionary and set data structures for efficient lookups
  • Employs logarithmic calculations for popularity scoring

Technical Details

  • Python 3.x compatible implementation
  • Uses built-in math and random libraries
  • Requires user-item interaction data as input
  • Supports configurable N-recommendation list size
  • Handles both binary and weighted user-item relationships

Best Practices Demonstrated

The code demonstrates strong testing practices through clear function separation and comprehensive metric coverage. Each metric function is self-contained with well-defined inputs and outputs.

  • Clear function documentation and naming
  • Efficient data structure usage
  • Robust error handling for edge cases
  • Modular design for easy maintenance

apachecn/ailearning

src/py3.x/ml/16.RecommenderSystems/test_evaluation_model.py

            
import math
import random

def SplitData(data, M, k, seed):
    test = []
    train = []
    random.seed(seed)
    for user, item in data:
        if random.randint(0, M) == k:
            test.append([user, item])
        else:
            train.append([user, item])
    return train, test


# 准确率
def Precision(train, test, N):
    hit = 0
    all = 0
    for user in train.keys():
        tu = test[user]
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            if item in tu:
                hit += 1
        all += N
    return hit / (all * 1.0)


# 召回率
def Recall(train, test, N):
    hit = 0
    all = 0
    for user in train.keys():
        tu = test[user]
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            if item in tu:
                hit += 1
        all += len(tu)
    return hit / (all * 1.0)


# 覆盖率
def Coverage(train, test, N):
    recommend_items = set()
    all_items = set()
    for user in train.keys():
        for item in train[user].keys():
            all_items.add(item)
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            recommend_items.add(item)
    return len(recommend_items) / (len(all_items) * 1.0)


# 新颖度
def Popularity(train, test, N):
    item_popularity = dict()
    for user, items in train.items():
        for item in items.keys():
            if item not in item_popularity:
                item_popularity[item] = 0
                item_popularity[item] += 1
    ret = 0
    n = 0
    for user in train.keys():
        rank = GetRecommendation(user, N)
        for item, pui in rank:
            ret += math.log(1 + item_popularity[item])
            n += 1
    ret /= n * 1.0
    return ret