Back to Repositories

Testing Parallel OCR Scheduler Performance in Tesseract.js

This test suite validates the scheduler functionality in Tesseract.js, focusing on worker management and parallel job execution. It verifies the performance scaling capabilities when processing OCR tasks across multiple workers.

Test Coverage Overview

The test suite examines parallel processing capabilities with varying numbers of workers (1, 3, and 5) handling 10 OCR recognition jobs.

Key areas covered:

Worker initialization and management
Scheduler job distribution
Concurrent job processing
Performance scaling with different worker counts

Implementation Analysis

The testing approach utilizes Jest’s asynchronous testing patterns with Promise.all for managing parallel operations. It implements a before hook to initialize workers and uses dynamic test generation through forEach to validate different worker configurations.

Technical implementation features:

Dynamic worker pool creation
Async/await pattern usage
Parameterized test cases
Flexible timeout configuration

Technical Details

Testing infrastructure includes:

Jest test framework
Tesseract.js worker API
Promise-based async testing
Custom timeout configurations
Environment configuration via OPTIONS object
Image path constants for test data

Best Practices Demonstrated

The test suite showcases several testing best practices for parallel processing validation.

Notable practices include:

Proper test isolation and cleanup
Scalability testing with varying worker counts
Consistent job verification
Resource cleanup and management
Clear test case organization
Appropriate timeout handling for async operations

naptha/tesseractJs

tests/scheduler.test.js

            
const { createScheduler, createWorker } = Tesseract;

let workers = [];

before(async function cb() {
  this.timeout(0);
  const NUM_WORKERS = 5;
  console.log(`Initializing ${NUM_WORKERS} workers`);
  workers = await Promise.all(Array(NUM_WORKERS).fill(0).map(async () => (createWorker('eng', 1, OPTIONS))));
  console.log(`Initialized ${NUM_WORKERS} workers`);
});

describe('scheduler', () => {
  describe('should speed up with more workers (running 10 jobs)', () => {
    [1, 3, 5].forEach((num) => (
      it(`support using ${num} workers`, async () => {
        const NUM_JOBS = 10;
        const scheduler = createScheduler();
        workers.slice(0, num).forEach((w) => {
          scheduler.addWorker(w);
        });
        const rets = await Promise.all(Array(NUM_JOBS).fill(0).map((_, idx) => (
          scheduler.addJob('recognize', `${IMAGE_PATH}/${idx % 2 === 0 ? 'simple' : 'cosmic'}.png`)
        )));
        expect(rets.length).to.be(NUM_JOBS);
      }).timeout(60000)
    ));
  });
});