Back to Repositories

Validating Python Syntax Tokenization in monaco-editor

This test suite validates Python language tokenization in the Monaco Editor, focusing on syntax highlighting and token parsing functionality. It ensures accurate identification and classification of Python code elements including keywords, strings, comments, numbers, and f-strings.

Test Coverage Overview

The test suite provides comprehensive coverage of Python syntax tokenization, examining key language elements:

Keyword and function definition parsing
String literal handling (single, double, triple quotes)
Comment recognition and formatting
Numeric literal parsing (integers, hex, scientific notation)
F-string interpolation and formatting
Decorator syntax validation

Implementation Analysis

The testing approach uses a structured tokenization framework that validates each code element against expected token types and positions. The implementation leverages array-based test cases that define input lines and expected token outputs, allowing for precise verification of the lexical analysis process.

Each test case specifies the expected token type, start index, and content classification, ensuring accurate syntax highlighting behavior.

Technical Details

Testing Tools and Configuration:

Custom testTokenization runner for Monaco Editor
Token classification system for Python syntax
Precise index-based position tracking
Token type definitions including keyword, string, comment, number, and delimiter categories

Best Practices Demonstrated

The test suite exemplifies several testing best practices:

Comprehensive edge case coverage for string literals and formatting
Granular token validation for accurate syntax highlighting
Structured test organization by language feature
Clear separation of test cases and expected results
Detailed token metadata validation

microsoft/monaco-editor

src/basic-languages/python/python.test.ts

            
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/

import { testTokenization } from '../test/testRunner';

testTokenization('python', [
	// Keywords
	[
		{
			line: 'def func():',
			tokens: [
				{ startIndex: 0, type: 'keyword.python' },
				{ startIndex: 3, type: 'white.python' },
				{ startIndex: 4, type: 'identifier.python' },
				{ startIndex: 8, type: 'delimiter.parenthesis.python' },
				{ startIndex: 10, type: 'delimiter.python' }
			]
		}
	],

	[
		{
			line: 'func(str Y3)',
			tokens: [
				{ startIndex: 0, type: 'identifier.python' },
				{ startIndex: 4, type: 'delimiter.parenthesis.python' },
				{ startIndex: 5, type: 'keyword.python' },
				{ startIndex: 8, type: 'white.python' },
				{ startIndex: 9, type: 'identifier.python' },
				{ startIndex: 11, type: 'delimiter.parenthesis.python' }
			]
		}
	],

	[
		{
			line: '@Dec0_rator:',
			tokens: [
				{ startIndex: 0, type: 'tag.python' },
				{ startIndex: 11, type: 'delimiter.python' }
			]
		}
	],

	// Comments
	[
		{
			line: ' # Comments! ## "jfkd" ',
			tokens: [
				{ startIndex: 0, type: 'white.python' },
				{ startIndex: 1, type: 'comment.python' }
			]
		}
	],

	// Strings
	[
		{
			line: "'s0'",
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 1, type: 'string.python' },
				{ startIndex: 3, type: 'string.escape.python' }
			]
		}
	],

	[
		{
			line: '"\' " "',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 1, type: 'string.python' },
				{ startIndex: 3, type: 'string.escape.python' },
				{ startIndex: 4, type: 'white.python' },
				{ startIndex: 5, type: 'string.escape.python' }
			]
		}
	],

	[
		{
			line: "'''Lots of string'''",
			tokens: [{ startIndex: 0, type: 'string.python' }]
		}
	],

	[
		{
			line: '"""Lots \'\'\'     \'\'\'"""',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		}
	],

	[
		{
			line: "'''Lots '''0.3e-5",
			tokens: [
				{ startIndex: 0, type: 'string.python' },
				{ startIndex: 11, type: 'number.python' }
			]
		}
	],

	// https://github.com/microsoft/monaco-editor/issues/1170
	[
		{
			line: 'def f():',
			tokens: [
				{ startIndex: 0, type: 'keyword.python' },
				{ startIndex: 3, type: 'white.python' },
				{ startIndex: 4, type: 'identifier.python' },
				{ startIndex: 5, type: 'delimiter.parenthesis.python' },
				{ startIndex: 7, type: 'delimiter.python' }
			]
		},
		{
			line: '   """multi',
			tokens: [
				{ startIndex: 0, type: 'white.python' },
				{ startIndex: 3, type: 'string.python' }
			]
		},
		{
			line: '   line',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   comment',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   """ + """',
			tokens: [
				{ startIndex: 0, type: 'string.python' },
				{ startIndex: 6, type: 'white.python' },
				{ startIndex: 7, type: '' },
				{ startIndex: 8, type: 'white.python' },
				{ startIndex: 9, type: 'string.python' }
			]
		},
		{
			line: '   another',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   multi',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   line',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   comment"""',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   code',
			tokens: [
				{ startIndex: 0, type: 'white.python' },
				{ startIndex: 3, type: 'identifier.python' }
			]
		}
	],

	// Numbers
	[
		{
			line: '0xAcBFd',
			tokens: [{ startIndex: 0, type: 'number.hex.python' }]
		}
	],

	[
		{
			line: '0x0cH',
			tokens: [
				{ startIndex: 0, type: 'number.hex.python' },
				{ startIndex: 4, type: 'identifier.python' }
			]
		}
	],

	[
		{
			line: '456.7e-7j',
			tokens: [{ startIndex: 0, type: 'number.python' }]
		}
	],

	// F-Strings
	[
		{
			line: 'f"str {var} str"',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 2, type: 'string.python' },
				{ startIndex: 6, type: 'identifier.python' },
				{ startIndex: 11, type: 'string.python' },
				{ startIndex: 15, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: `f'''str {var} str'''`,
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 4, type: 'string.python' },
				{ startIndex: 8, type: 'identifier.python' },
				{ startIndex: 13, type: 'string.python' },
				{ startIndex: 17, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: 'f"{var:.3f}{var!r}{var=}"',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 2, type: 'identifier.python' },
				{ startIndex: 6, type: 'string.python' },
				{ startIndex: 10, type: 'identifier.python' },
				{ startIndex: 15, type: 'string.python' },
				{ startIndex: 17, type: 'identifier.python' },
				{ startIndex: 22, type: 'string.python' },
				{ startIndex: 23, type: 'identifier.python' },
				{ startIndex: 24, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: 'f"\' " "',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 2, type: 'string.python' },
				{ startIndex: 4, type: 'string.escape.python' },
				{ startIndex: 5, type: 'white.python' },
				{ startIndex: 6, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: '"{var}"',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 1, type: 'string.python' },
				{ startIndex: 6, type: 'string.escape.python' }
			]
		}
	]
]);