Back to Repositories

Validating Unicode Literal Implementation in youtube-dl

This test suite validates the consistent usage of unicode literals across the youtube-dl codebase. It ensures all Python files properly import and implement unicode string handling through the __future__ module to maintain compatibility and prevent encoding issues.

Test Coverage Overview

The test suite provides comprehensive coverage of unicode literal usage across the entire codebase. It recursively scans all Python files, excluding specific ignored directories and files, to verify:
  • Presence of unicode_literals import from __future__
  • Absence of explicit u” string prefixes
  • Proper file encoding handling

Implementation Analysis

The testing approach uses a systematic file traversal pattern with unittest framework. It implements directory walking via os.walk() to recursively check all .py files, while maintaining configurable exclusion lists for both directories and files. The code leverages regex pattern matching to validate import statements and string literal usage.

Technical Details

Key technical components include:
  • unittest framework for test organization
  • os.path for file system navigation
  • Regular expressions for code analysis
  • UTF-8 file handling
  • Custom assertion helper (assertRegexpMatches)

Best Practices Demonstrated

The test implementation showcases several testing best practices:
  • Systematic file traversal with clear exclusion handling
  • Proper error messages with context
  • Modular test organization
  • Robust string pattern matching
  • Clear separation of test configuration and execution

ytdl-org/youtube-dl

test/test_unicode_literals.py

            
from __future__ import unicode_literals

# Allow direct execution
import os
import re
import sys
import unittest

dirn = os.path.dirname

rootDir = dirn(dirn(os.path.abspath(__file__)))

sys.path.insert(0, rootDir)

IGNORED_FILES = [
    'setup.py',  # http://bugs.python.org/issue13943
    'conf.py',
    'buildserver.py',
    'get-pip.py',
]

IGNORED_DIRS = [
    '.git',
    '.tox',
]

from test.helper import assertRegexpMatches
from youtube_dl.compat import compat_open as open


class TestUnicodeLiterals(unittest.TestCase):
    def test_all_files(self):
        for dirpath, dirnames, filenames in os.walk(rootDir):
            for ignore_dir in IGNORED_DIRS:
                if ignore_dir in dirnames:
                    # If we remove the directory from dirnames os.walk won't
                    # recurse into it
                    dirnames.remove(ignore_dir)
            for basename in filenames:
                if not basename.endswith('.py'):
                    continue
                if basename in IGNORED_FILES:
                    continue

                fn = os.path.join(dirpath, basename)
                with open(fn, encoding='utf-8') as inf:
                    code = inf.read()

                if "'" not in code and '"' not in code:
                    continue
                assertRegexpMatches(
                    self,
                    code,
                    r'(?:(?:#.*?|\s*)
)*from __future__ import (?:[a-z_]+,\s*)*unicode_literals',
                    'unicode_literals import  missing in %s' % fn)

                m = re.search(r'(?<=\s)u[\'"](?!\)|,|$)', code)
                if m is not None:
                    self.assertTrue(
                        m is None,
                        'u present in %s, around %s' % (
                            fn, code[m.start() - 10:m.end() + 10]))


if __name__ == '__main__':
    unittest.main()