100% in-browser conversational speech synthesis — no server. 9 emotions. ONNX Runtime Web + WebGPU.
Set emotion inline with tags: [happy] … [sad] …. Add an
intensity per tag with [emotion:NN] — e.g.
[confused:40], [happy:80] (NN = 0–120%). A bare
[emotion] uses that emotion's default. Each tagged span is
generated separately and concatenated; the global slider scales them all.
Initializing WebGPU…
Model: v28_rc1 (conversational, single-forward for clean speech).
Emotion = style_emb added to the voice vector (style = voice + α·emotion);
[emotion:NN] sets α per tag. confused/whisper
default to 60% so they don't mumble. NV tags (<laugh>) land in v29.