Back to Repositories

Testing Unicode Character Width Calculations in termux-app

This test suite validates Unicode character width calculations in the Termux terminal emulator, ensuring proper display of various character types including ASCII, wide characters, combining marks, and emojis.

Test Coverage Overview

The test suite provides comprehensive coverage of character width calculations across different Unicode ranges.

Tests printable ASCII characters (0x20-0x7E)
Validates wide characters including CJK
Verifies combining characters and special Unicode points
Handles special cases like word joiners and soft hyphens
Tests modern emoji characters

Implementation Analysis

The implementation uses JUnit’s TestCase framework with a methodical approach to character width validation.

Each test method focuses on a specific character category, using a helper method assertWidthIs() to verify expected widths. The testing pattern systematically covers different Unicode ranges and special cases with explicit test cases.

Technical Details

JUnit framework for test organization
Custom assertion helper method for width validation
Unicode codepoint testing using hexadecimal values
Covers Unicode versions up to 8.0
Integration with WcWidth utility class

Best Practices Demonstrated

The test suite exemplifies strong testing practices through organized test categorization and thorough documentation.

Clear test method naming conventions
Comprehensive documentation of edge cases
Systematic coverage of character ranges
Well-documented special cases with references
Efficient test helper methods

termux/termux-app

terminal-emulator/src/test/java/com/termux/terminal/WcWidthTest.java

            
package com.termux.terminal;

import junit.framework.TestCase;

public class WcWidthTest extends TestCase {

	private static void assertWidthIs(int expectedWidth, int codePoint) {
		int wcWidth = WcWidth.width(codePoint);
		assertEquals(expectedWidth, wcWidth);
	}

	public void testPrintableAscii() {
		for (int i = 0x20; i <= 0x7E; i++) {
			assertWidthIs(1, i);
		}
	}

	public void testSomeWidthOne() {
		assertWidthIs(1, 'å');
		assertWidthIs(1, 'ä');
		assertWidthIs(1, 'ö');
		assertWidthIs(1, 0x23F2);
	}

	public void testSomeWide() {
		assertWidthIs(2, 'Ａ');
		assertWidthIs(2, 'Ｂ');
		assertWidthIs(2, 'Ｃ');
		assertWidthIs(2, '中');
		assertWidthIs(2, '文');

		assertWidthIs(2, 0x679C);
		assertWidthIs(2, 0x679D);

		assertWidthIs(2, 0x2070E);
		assertWidthIs(2, 0x20731);

		assertWidthIs(1, 0x1F781);
	}

	public void testSomeNonWide() {
		assertWidthIs(1, 0x1D11E);
		assertWidthIs(1, 0x1D11F);
	}

	public void testCombining() {
		assertWidthIs(0, 0x0302);
		assertWidthIs(0, 0x0308);
		assertWidthIs(0, 0xFE0F);
	}

	public void testWordJoiner() {
		// https://en.wikipedia.org/wiki/Word_joiner
		// The word joiner (WJ) is a code point in Unicode used to separate words when using scripts
		// that do not use explicit spacing. It is encoded since Unicode version 3.2
		// (released in 2002) as U+2060 WORD JOINER (HTML ⁠).
		// The word joiner does not produce any space, and prohibits a line break at its position.
		assertWidthIs(0, 0x2060);
	}

	public void testSofthyphen() {
		// http://osdir.com/ml/internationalization.linux/2003-05/msg00006.html:
		// "Existing implementation practice in terminals is that the SOFT HYPHEN is
		// a spacing graphical character, and the purpose of my wcwidth() was to
		// predict the advancement of the cursor position after a string is sent to
		// a terminal. Hence, I have no choice but to keep wcwidth(SOFT HYPHEN) = 1.
		// VT100-style terminals do not hyphenate."
		assertWidthIs(1, 0x00AD);
	}

	public void testHangul() {
		assertWidthIs(1, 0x11A3);
	}

	public void testEmojis() {
		assertWidthIs(2, 0x1F428); // KOALA.
		assertWidthIs(2, 0x231a);  // WATCH.
		assertWidthIs(2, 0x1F643); // UPSIDE-DOWN FACE (Unicode 8).
	}

}