Back to Repositories

Testing Trie Data Structure Operations in HanLP

This test suite validates the functionality of a Trie data structure implementation in HanLP, focusing on Chinese text parsing and dictionary operations. The tests verify both basic and advanced features of the Trie implementation, including parsing, longest match parsing, and dictionary management.

Test Coverage Overview

The test suite provides comprehensive coverage of the Trie implementation’s core functionality.

Key areas tested include:
  • Basic trie construction and dictionary operations
  • Text parsing with multiple match possibilities
  • Longest match parsing algorithm
  • Dictionary item management and length operations
Edge cases covered include overlapping matches and dictionary modification scenarios.

Implementation Analysis

The testing approach utilizes Python’s unittest framework with a systematic class-based structure. The TestTrie class implements helper methods for common operations and contains individual test methods for specific functionality.

Testing patterns include:
  • Setup helper methods for consistent test data
  • Assertion wrapper methods for validation
  • Isolated test cases for each feature
  • Verification of both input and output states

Technical Details

Testing infrastructure includes:
  • Python unittest framework
  • Custom Trie implementation from hanlp_trie module
  • Helper methods for test data construction
  • Assertion utilities for result validation
Configuration utilizes a predefined dictionary of Chinese-English mappings for consistent testing.

Best Practices Demonstrated

The test suite exemplifies several testing best practices:

  • Modular test case organization
  • Clear test method naming conventions
  • Reusable setup and validation methods
  • Comprehensive assertion checking
  • Independent test cases
  • Consistent test data management

hankcs/hanlp

plugins/hanlp_trie/tests/test_trie.py

            
import unittest

from hanlp_trie import Trie


class TestTrie(unittest.TestCase):
    def build_small_trie(self):
        return Trie({'商品': 'goods', '和': 'and', '和服': 'kimono', '服务': 'service', '务': 'business'})

    def assert_results_valid(self, text, results, trie):
        for begin, end, value in results:
            self.assertEqual(value, trie[text[begin:end]])

    def test_parse(self):
        trie = self.build_small_trie()
        text = '商品和服务'
        parse_result = trie.parse(text)
        self.assert_results_valid(text, parse_result, trie)
        self.assertEqual([(0, 2, 'goods'),
                          (2, 3, 'and'),
                          (2, 4, 'kimono'),
                          (3, 5, 'service'),
                          (4, 5, 'business')],
                         parse_result)

    def test_parse_longest(self):
        trie = self.build_small_trie()
        text = '商品和服务'
        parse_longest_result = trie.parse_longest(text)
        self.assert_results_valid(text, parse_longest_result, trie)
        self.assertEqual([(0, 2, 'goods'), (2, 4, 'kimono'), (4, 5, 'business')],
                         parse_longest_result)

    def test_items(self):
        trie = self.build_small_trie()
        items = list(trie.items())
        self.assertEqual([('商品', 'goods'), ('和', 'and'), ('和服', 'kimono'), ('服务', 'service'), ('务', 'business')], items)

    def test_len(self):
        trie = self.build_small_trie()
        self.assertEqual(len(trie), 5)
        trie['和'] = '&'
        self.assertEqual(len(trie), 5)
        del trie['和']
        self.assertEqual(len(trie), 4)
        trie['和'] = '&'
        self.assertEqual(len(trie), 5)


if __name__ == '__main__':
    unittest.main()