Back to Repositories

Testing Parallel OCR Scheduler Performance in Tesseract.js

This test suite validates the scheduler functionality in Tesseract.js, focusing on worker management and parallel job execution. It verifies the performance scaling capabilities when processing OCR tasks across multiple workers.

Test Coverage Overview

The test suite examines parallel processing capabilities with varying numbers of workers (1, 3, and 5) handling 10 OCR recognition jobs.

Key areas covered:
  • Worker initialization and management
  • Scheduler job distribution
  • Concurrent job processing
  • Performance scaling with different worker counts

Implementation Analysis

The testing approach utilizes Jest’s asynchronous testing patterns with Promise.all for managing parallel operations. It implements a before hook to initialize workers and uses dynamic test generation through forEach to validate different worker configurations.

Technical implementation features:
  • Dynamic worker pool creation
  • Async/await pattern usage
  • Parameterized test cases
  • Flexible timeout configuration

Technical Details

Testing infrastructure includes:
  • Jest test framework
  • Tesseract.js worker API
  • Promise-based async testing
  • Custom timeout configurations
  • Environment configuration via OPTIONS object
  • Image path constants for test data

Best Practices Demonstrated

The test suite showcases several testing best practices for parallel processing validation.

Notable practices include:
  • Proper test isolation and cleanup
  • Scalability testing with varying worker counts
  • Consistent job verification
  • Resource cleanup and management
  • Clear test case organization
  • Appropriate timeout handling for async operations

naptha/tesseractJs

tests/scheduler.test.js

            
const { createScheduler, createWorker } = Tesseract;

let workers = [];

before(async function cb() {
  this.timeout(0);
  const NUM_WORKERS = 5;
  console.log(`Initializing ${NUM_WORKERS} workers`);
  workers = await Promise.all(Array(NUM_WORKERS).fill(0).map(async () => (createWorker('eng', 1, OPTIONS))));
  console.log(`Initialized ${NUM_WORKERS} workers`);
});

describe('scheduler', () => {
  describe('should speed up with more workers (running 10 jobs)', () => {
    [1, 3, 5].forEach((num) => (
      it(`support using ${num} workers`, async () => {
        const NUM_JOBS = 10;
        const scheduler = createScheduler();
        workers.slice(0, num).forEach((w) => {
          scheduler.addWorker(w);
        });
        const rets = await Promise.all(Array(NUM_JOBS).fill(0).map((_, idx) => (
          scheduler.addJob('recognize', `${IMAGE_PATH}/${idx % 2 === 0 ? 'simple' : 'cosmic'}.png`)
        )));
        expect(rets.length).to.be(NUM_JOBS);
      }).timeout(60000)
    ));
  });
});