Back to Repositories

Validating Python Syntax Tokenization in monaco-editor

This test suite validates Python language tokenization in the Monaco Editor, focusing on syntax highlighting and token parsing functionality. It ensures accurate identification and classification of Python code elements including keywords, strings, comments, numbers, and f-strings.

Test Coverage Overview

The test suite provides comprehensive coverage of Python syntax tokenization, examining key language elements:
  • Keyword and function definition parsing
  • String literal handling (single, double, triple quotes)
  • Comment recognition and formatting
  • Numeric literal parsing (integers, hex, scientific notation)
  • F-string interpolation and formatting
  • Decorator syntax validation

Implementation Analysis

The testing approach uses a structured tokenization framework that validates each code element against expected token types and positions. The implementation leverages array-based test cases that define input lines and expected token outputs, allowing for precise verification of the lexical analysis process.

Each test case specifies the expected token type, start index, and content classification, ensuring accurate syntax highlighting behavior.

Technical Details

Testing Tools and Configuration:
  • Custom testTokenization runner for Monaco Editor
  • Token classification system for Python syntax
  • Precise index-based position tracking
  • Token type definitions including keyword, string, comment, number, and delimiter categories

Best Practices Demonstrated

The test suite exemplifies several testing best practices:
  • Comprehensive edge case coverage for string literals and formatting
  • Granular token validation for accurate syntax highlighting
  • Structured test organization by language feature
  • Clear separation of test cases and expected results
  • Detailed token metadata validation

microsoft/monaco-editor

src/basic-languages/python/python.test.ts

            
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/

import { testTokenization } from '../test/testRunner';

testTokenization('python', [
	// Keywords
	[
		{
			line: 'def func():',
			tokens: [
				{ startIndex: 0, type: 'keyword.python' },
				{ startIndex: 3, type: 'white.python' },
				{ startIndex: 4, type: 'identifier.python' },
				{ startIndex: 8, type: 'delimiter.parenthesis.python' },
				{ startIndex: 10, type: 'delimiter.python' }
			]
		}
	],

	[
		{
			line: 'func(str Y3)',
			tokens: [
				{ startIndex: 0, type: 'identifier.python' },
				{ startIndex: 4, type: 'delimiter.parenthesis.python' },
				{ startIndex: 5, type: 'keyword.python' },
				{ startIndex: 8, type: 'white.python' },
				{ startIndex: 9, type: 'identifier.python' },
				{ startIndex: 11, type: 'delimiter.parenthesis.python' }
			]
		}
	],

	[
		{
			line: '@Dec0_rator:',
			tokens: [
				{ startIndex: 0, type: 'tag.python' },
				{ startIndex: 11, type: 'delimiter.python' }
			]
		}
	],

	// Comments
	[
		{
			line: ' # Comments! ## "jfkd" ',
			tokens: [
				{ startIndex: 0, type: 'white.python' },
				{ startIndex: 1, type: 'comment.python' }
			]
		}
	],

	// Strings
	[
		{
			line: "'s0'",
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 1, type: 'string.python' },
				{ startIndex: 3, type: 'string.escape.python' }
			]
		}
	],

	[
		{
			line: '"\' " "',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 1, type: 'string.python' },
				{ startIndex: 3, type: 'string.escape.python' },
				{ startIndex: 4, type: 'white.python' },
				{ startIndex: 5, type: 'string.escape.python' }
			]
		}
	],

	[
		{
			line: "'''Lots of string'''",
			tokens: [{ startIndex: 0, type: 'string.python' }]
		}
	],

	[
		{
			line: '"""Lots \'\'\'     \'\'\'"""',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		}
	],

	[
		{
			line: "'''Lots '''0.3e-5",
			tokens: [
				{ startIndex: 0, type: 'string.python' },
				{ startIndex: 11, type: 'number.python' }
			]
		}
	],

	// https://github.com/microsoft/monaco-editor/issues/1170
	[
		{
			line: 'def f():',
			tokens: [
				{ startIndex: 0, type: 'keyword.python' },
				{ startIndex: 3, type: 'white.python' },
				{ startIndex: 4, type: 'identifier.python' },
				{ startIndex: 5, type: 'delimiter.parenthesis.python' },
				{ startIndex: 7, type: 'delimiter.python' }
			]
		},
		{
			line: '   """multi',
			tokens: [
				{ startIndex: 0, type: 'white.python' },
				{ startIndex: 3, type: 'string.python' }
			]
		},
		{
			line: '   line',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   comment',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   """ + """',
			tokens: [
				{ startIndex: 0, type: 'string.python' },
				{ startIndex: 6, type: 'white.python' },
				{ startIndex: 7, type: '' },
				{ startIndex: 8, type: 'white.python' },
				{ startIndex: 9, type: 'string.python' }
			]
		},
		{
			line: '   another',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   multi',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   line',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   comment"""',
			tokens: [{ startIndex: 0, type: 'string.python' }]
		},
		{
			line: '   code',
			tokens: [
				{ startIndex: 0, type: 'white.python' },
				{ startIndex: 3, type: 'identifier.python' }
			]
		}
	],

	// Numbers
	[
		{
			line: '0xAcBFd',
			tokens: [{ startIndex: 0, type: 'number.hex.python' }]
		}
	],

	[
		{
			line: '0x0cH',
			tokens: [
				{ startIndex: 0, type: 'number.hex.python' },
				{ startIndex: 4, type: 'identifier.python' }
			]
		}
	],

	[
		{
			line: '456.7e-7j',
			tokens: [{ startIndex: 0, type: 'number.python' }]
		}
	],

	// F-Strings
	[
		{
			line: 'f"str {var} str"',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 2, type: 'string.python' },
				{ startIndex: 6, type: 'identifier.python' },
				{ startIndex: 11, type: 'string.python' },
				{ startIndex: 15, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: `f'''str {var} str'''`,
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 4, type: 'string.python' },
				{ startIndex: 8, type: 'identifier.python' },
				{ startIndex: 13, type: 'string.python' },
				{ startIndex: 17, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: 'f"{var:.3f}{var!r}{var=}"',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 2, type: 'identifier.python' },
				{ startIndex: 6, type: 'string.python' },
				{ startIndex: 10, type: 'identifier.python' },
				{ startIndex: 15, type: 'string.python' },
				{ startIndex: 17, type: 'identifier.python' },
				{ startIndex: 22, type: 'string.python' },
				{ startIndex: 23, type: 'identifier.python' },
				{ startIndex: 24, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: 'f"\' " "',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 2, type: 'string.python' },
				{ startIndex: 4, type: 'string.escape.python' },
				{ startIndex: 5, type: 'white.python' },
				{ startIndex: 6, type: 'string.escape.python' }
			]
		}
	],
	[
		{
			line: '"{var}"',
			tokens: [
				{ startIndex: 0, type: 'string.escape.python' },
				{ startIndex: 1, type: 'string.python' },
				{ startIndex: 6, type: 'string.escape.python' }
			]
		}
	]
]);