Back to Repositories

Testing Model Accuracy Metrics Implementation in PaddleOCR

This test module implements metric calculation functionality for PaddleOCR’s model evaluation, focusing on accuracy measurements like top-1 and top-k predictions. It provides essential validation capabilities for model performance assessment across distributed environments.

Test Coverage Overview

The test suite covers accuracy metric calculations for deep learning model evaluation.

  • Tests top-1 and top-k accuracy computations
  • Validates softmax output processing
  • Handles distributed evaluation scenarios
  • Supports variable class numbers and prediction thresholds

Implementation Analysis

The implementation uses PaddlePaddle’s metric utilities for accuracy calculations.

Key patterns include:
  • Flexible topk parameter handling
  • Distributed computation support using all_reduce operations
  • Ordered dictionary result storage
  • Softmax activation processing

Technical Details

Testing infrastructure includes:

  • PaddlePaddle metric APIs
  • Distributed evaluation tools
  • OrderedDict for results management
  • F.softmax for probability normalization

Best Practices Demonstrated

The test implementation showcases robust evaluation practices.

  • Proper handling of distributed environments
  • Flexible parameter configuration
  • Clear separation of computation steps
  • Efficient result aggregation

paddlepaddle/paddleocr

test_tipc/supplementary/metric.py

            
import paddle
import paddle.nn.functional as F
from collections import OrderedDict


def create_metric(
    out,
    label,
    architecture=None,
    topk=5,
    classes_num=1000,
    use_distillation=False,
    mode="train",
):
    """
    Create measures of model accuracy, such as top1 and top5

    Args:
        out(variable): model output variable
        feeds(dict): dict of model input variables(included label)
        topk(int): usually top5
        classes_num(int): num of classes
        use_distillation(bool): whether to use distillation training
        mode(str): mode, train/valid

    Returns:
        fetchs(dict): dict of measures
    """
    # if architecture["name"] == "GoogLeNet":
    #     assert len(out) == 3, "GoogLeNet should have 3 outputs"
    #     out = out[0]
    # else:
    #     # just need student label to get metrics
    #     if use_distillation:
    #         out = out[1]
    softmax_out = F.softmax(out)

    fetchs = OrderedDict()
    # set top1 to fetchs
    top1 = paddle.metric.accuracy(softmax_out, label=label, k=1)
    # set topk to fetchs
    k = min(topk, classes_num)
    topk = paddle.metric.accuracy(softmax_out, label=label, k=k)

    # multi cards' eval
    if mode != "train" and paddle.distributed.get_world_size() > 1:
        top1 = (
            paddle.distributed.all_reduce(top1, op=paddle.distributed.ReduceOp.SUM)
            / paddle.distributed.get_world_size()
        )
        topk = (
            paddle.distributed.all_reduce(topk, op=paddle.distributed.ReduceOp.SUM)
            / paddle.distributed.get_world_size()
        )

    fetchs["top1"] = top1
    topk_name = "top{}".format(k)
    fetchs[topk_name] = topk

    return fetchs