Back to Blog
Fingerprint

Speech Synthesis Fingerprinting: How Voice Lists Identify Your OS

How the SpeechSynthesis API voice list reveals your operating system and platform, and techniques to control voice-based fingerprint signals.

Introduction

The Web Speech API's SpeechSynthesis interface was designed to give web applications text-to-speech capabilities. It allows developers to convert text into spoken audio, enabling accessibility features, language learning tools, and interactive voice experiences. The speechSynthesis.getVoices() method returns a list of available voices, each with properties like name, language, and whether it runs locally or through a remote service.

While the API serves a clear accessibility purpose, the voice list it exposes varies significantly between operating systems, browser versions, and installed language packs. A Windows 11 system might report 40+ voices including Microsoft David, Zira, and various Cortana voices. A macOS system reports entirely different voices like Alex, Samantha, and a range of Siri-based voices. Linux systems using speech-dispatcher report yet another set. This platform-specific variation makes the voice list a reliable signal for identifying the underlying operating system and configuration.

Privacy Impact

The SpeechSynthesis voice list is a particularly effective fingerprinting vector because it combines high entropy with low awareness. Most users do not know that websites can enumerate their installed text-to-speech voices, and there is no permission prompt or notification when this happens.

The privacy concern goes beyond simple OS identification. Voice lists vary not only by operating system but by:

  • OS version: Windows 10 and Windows 11 ship different default voice sets. macOS Ventura and macOS Sonoma include different Siri voices.
  • Language packs: Users who install additional language packs gain new voices, creating a more distinctive fingerprint.
  • Third-party TTS software: Applications like Balabolka, NaturalReader, or NVDA screen readers can add voices to the system, further distinguishing the device.
  • Browser version: Chrome, Firefox, and Edge each expose different subsets of the system's available voices.

A 2021 study by researchers at the University of Iowa demonstrated that speech synthesis voice lists, when combined with other browser signals, could increase fingerprint uniqueness by 12-18% compared to fingerprinting without voice data. The voice list is especially valuable because it reveals information about the operating system that is otherwise difficult to obtain after User-Agent reduction efforts.

The onvoiceschanged event adds another dimension: by observing when and how the voice list loads, trackers can infer information about the browser's internal initialization sequence, which varies between platforms.

Technical Background

How Voice Lists Are Generated

When a browser starts, it queries the operating system's text-to-speech subsystem for available voices. On Windows, this means the Speech API (SAPI) and the more modern OneCore speech platform. On macOS, the browser queries the NSSpeechSynthesizer framework. On Linux, it typically uses speech-dispatcher or directly queries installed engines like eSpeak, Festival, or Piper.

Each voice object returned by speechSynthesis.getVoices() has several properties:

  • name: The voice's display name (e.g., "Microsoft David - English (United States)")
  • lang: The BCP 47 language tag (e.g., "en-US")
  • localService: Whether the voice runs locally (true) or requires a network connection (false)
  • voiceURI: A URI identifying the voice
  • default: Whether this is the default voice for its language

Platform-Specific Voice Signatures

The voice list acts as a platform signature. A standard Windows 11 installation reports voices with names starting with "Microsoft" and includes platform-specific variants. macOS reports voices with Apple-specific names and includes Siri voices on recent versions. Chrome on Android reports a different set entirely, often including Google-branded voices.

This creates a matrix of identifiers: the number of voices, their exact names, their language coverage, and the local/remote split all contribute to a platform fingerprint. Even the order in which voices appear in the array can differ between platforms.

Asynchronous Loading Behavior

Voice loading is asynchronous in most browsers. The initial call to getVoices() may return an empty array, with the full list becoming available after the voiceschanged event fires. The timing of this event, and whether getVoices() returns an empty list initially, varies between browsers and platforms. This loading behavior is itself a fingerprinting signal.

Network Voices

Some browsers include network-based voices that require an internet connection. The availability of these voices depends on the browser, the user's Google account status (for Chrome), and network connectivity. The presence or absence of network voices adds another layer to the fingerprint.

Common Protection Approaches and Their Limitations

VPNs and Proxy Servers

VPNs change the IP address but have no effect on the speech synthesis voice list. Voice data comes from the local operating system, not from the network. Two devices behind the same VPN report entirely different voice lists based on their respective OS and language configurations.

Incognito and Private Browsing

Private browsing modes do not alter the voice list. The same voices are available in incognito as in a normal window, because the voice list is read from the operating system, not from browser storage.

Browser Extensions

Extensions that modify speechSynthesis.getVoices() face several challenges:

  • Spoofing the return value: An extension can override getVoices() to return a custom voice list, but the voices it reports must be usable. If a website attempts to use a reported voice and it fails, the inconsistency is apparent.
  • Event timing: The voiceschanged event behavior is difficult to control from an extension. The event's timing, how many times it fires, and the initial empty-array behavior are all platform-specific signals that extensions struggle to replicate accurately.
  • Property descriptors: Overriding getVoices() via extension-injected JavaScript changes the function's property descriptors and prototype chain, which can be detected.

Blocking the API

Disabling speechSynthesis entirely is detectable: a website can check whether the API exists and whether it returns results. A browser that reports speechSynthesis as available but returns no voices is itself a distinctive signal.

BotBrowser's Engine-Level Approach

BotBrowser controls speech synthesis voice lists at the browser engine level. When a fingerprint profile is loaded, the voice list is configured to match the profile's target platform before any page code executes.

Profile-Controlled Voice Lists

chrome --bot-profile="/path/to/profile.enc" \
       --user-data-dir="$(mktemp -d)"

When a profile representing a Windows 11 system is loaded, speechSynthesis.getVoices() returns the exact voice list expected on that platform, including correct names, languages, localService flags, and ordering. This is true regardless of the actual host operating system.

Cross-Platform Consistency

This is where BotBrowser's engine-level approach provides the most value. Running a Windows profile on a Linux server would normally expose Linux-native voices (eSpeak, Festival), immediately revealing that the browser is not running on the reported platform. BotBrowser replaces the voice list with the profile's expected voices, maintaining platform consistency across all fingerprint surfaces.

The voice list aligns with other platform signals:

  • navigator.platform matches the profile's OS
  • The User-Agent string reports the correct platform
  • Font lists match the target OS
  • Other OS-dependent APIs report consistent values
  • The speech synthesis voices match all of the above

Realistic Voice Behavior

BotBrowser does not just return a static list. The voice loading behavior, including the asynchronous voiceschanged event timing and the initial getVoices() return pattern, matches the expected behavior for the profile's target browser and platform. This ensures that both the data and the loading behavior are consistent.

Voice Object Fidelity

Each voice object in the returned list has accurate properties: correct name format, appropriate lang tags, proper localService values, and realistic voiceURI strings. The profile data is captured from real devices, ensuring that every property matches what a genuine installation would report.

Configuration and Usage

Basic CLI Usage

Voice list protection is automatic when loading a profile:

chrome --bot-profile="/path/to/profile.enc" \
       --user-data-dir="$(mktemp -d)"

No additional flags are needed. The profile contains the complete voice list for the target platform.

Playwright Integration

const { chromium } = require('playwright-core');

(async () => {
  const browser = await chromium.launch({
    executablePath: '/path/to/botbrowser/chrome',
    args: [
      '--bot-profile=/path/to/profile.enc',
    ],
    headless: true,
  });

  const context = await browser.newContext({ viewport: null });
  const page = await context.newPage();

  const voices = await page.evaluate(() => {
    return new Promise(resolve => {
      const v = speechSynthesis.getVoices();
      if (v.length > 0) return resolve(v.map(voice => ({
        name: voice.name, lang: voice.lang, local: voice.localService,
      })));
      speechSynthesis.onvoiceschanged = () => {
        resolve(speechSynthesis.getVoices().map(voice => ({
          name: voice.name, lang: voice.lang, local: voice.localService,
        })));
      };
    });
  });

  console.log(`Voice count: ${voices.length}`);
  console.log('Voices:', JSON.stringify(voices, null, 2));
  await browser.close();
})();

Puppeteer Integration

const puppeteer = require('puppeteer-core');

(async () => {
  const browser = await puppeteer.launch({
    executablePath: '/path/to/botbrowser/chrome',
    args: [
      '--bot-profile=/path/to/profile.enc',
    ],
    headless: true,
    defaultViewport: null,
  });

  const page = await browser.newPage();
  await page.goto('about:blank');

  const voiceCount = await page.evaluate(() => {
    return new Promise(resolve => {
      const check = () => {
        const voices = speechSynthesis.getVoices();
        if (voices.length > 0) resolve(voices.length);
        else speechSynthesis.onvoiceschanged = () =>
          resolve(speechSynthesis.getVoices().length);
      };
      check();
    });
  });

  console.log('Voices available:', voiceCount);
  await browser.close();
})();

Verification

After launching BotBrowser with a profile, verify the voice list:

function getVoices() {
  return new Promise((resolve) => {
    const voices = speechSynthesis.getVoices();
    if (voices.length > 0) return resolve(voices);
    speechSynthesis.onvoiceschanged = () =>
      resolve(speechSynthesis.getVoices());
  });
}

const voices = await getVoices();
console.log(`Voice count: ${voices.length}`);
voices.forEach(v =>
  console.log(`${v.name} (${v.lang}) local: ${v.localService}`)
);

What to check:

  1. Voice count matches the expected count for the profile's target platform
  2. Voice names use the correct platform naming convention (e.g., "Microsoft" prefix for Windows)
  3. Language tags are appropriate for the target locale configuration
  4. The localService property is consistent with the platform's expected voice types
  5. The voice list does not contain voices from the host operating system
  6. Fingerprint testing tools show no cross-signal inconsistencies between voice data and other platform indicators

Best Practices

  1. Always use a complete profile. Voice list protection depends on the profile providing accurate voice data for the target platform. Partial or custom configurations may produce incomplete voice lists.

  2. Verify cross-platform consistency. When running profiles on a different OS than the target, check that the voice list matches the target platform, not the host. This is the most common source of voice fingerprint leaks.

  3. Consider locale alignment. A profile configured for a Japanese locale should include Japanese voices. BotBrowser profiles captured from real devices include the appropriate locale-specific voices.

  4. Do not install TTS extensions alongside BotBrowser. Third-party TTS browser extensions may register additional voices that conflict with the profile's controlled voice list.

Frequently Asked Questions

Do all browsers expose the same voice list?

No. Chrome, Firefox, Edge, and Safari each expose different subsets of the operating system's available voices. BotBrowser profiles are browser-specific, so a Chrome profile returns Chrome-appropriate voices and an Edge profile returns Edge-appropriate voices.

Can websites actually use the voices for synthesis?

BotBrowser's voice list is designed for fingerprint consistency. The actual text-to-speech functionality depends on the host system's capabilities. In most automated workflows, TTS playback is not needed, but the voice list must be present and accurate for fingerprint consistency.

Does the voice list change between browser versions?

Yes. Browser updates sometimes add or remove voice support. BotBrowser profiles are versioned and include the voice list expected for the specific browser version the profile represents.

How many voices does a typical platform report?

Windows 11 typically reports 30-50 voices depending on installed language packs. macOS reports 60-80 voices including Siri variants. Chrome on Android reports 5-15 voices. The exact count is one of the fingerprinting signals that BotBrowser controls.

Does voice list protection work in headless mode?

Yes. BotBrowser applies the profile's voice list regardless of whether the browser runs in headed or headless mode. This is important because headless environments typically have no TTS subsystem, and an empty voice list in headless mode is a strong detection signal.

What about the voiceschanged event timing?

BotBrowser controls the timing and behavior of the voiceschanged event to match the expected pattern for the profile's target platform. This includes whether getVoices() initially returns an empty array and how quickly the event fires after page load.

Summary

The SpeechSynthesis API's voice list is a high-entropy fingerprinting signal that reveals operating system, browser version, and language configuration. Standard privacy tools cannot address it because voice data comes from the OS, not the network or browser storage. BotBrowser controls voice lists at the engine level through its profile system, ensuring cross-platform consistency and alignment with all other fingerprint signals. For related protection, see navigator properties protection, font fingerprint control, and timezone and locale configuration.

#speech-synthesis#fingerprinting#privacy#tts#voice#platform