長文ナレーション - Supertone API Documentation

このドキュメントは英語の原文から自動翻訳されています。表現に不自然な箇所がある場合があります。正確な内容は英語の原文もあわせてご確認ください。

このサンプルでは、複数段落のスクリプトを受け取り、1つのオーディオファイルとして生成します。SDKの自動チャンク分割の仕組みを利用しており、スクリプト全体をcreate_speechに渡すと、SDKが文の区切りで分割し、各セグメントを生成して結合します。

Python

import os
from supertone import Supertone

VOICE_ID = "20160a4c5ba38967330c84"  # replace with your voice ID

SCRIPT = """
Chapter one. The clocktower struck midnight, and the wind through the old
square carried whispers from the workshops below. Hana adjusted her coat,
checked the address one more time, and stepped through the iron gate.

Inside, the air smelled of copper and lemon polish. Rows of half-finished
automatons stared back from the shelves, each one waiting for a name. She
set her satchel on the bench and opened her notebook to a fresh page.

By dawn, the room had changed. One of the figures by the window was no
longer half-finished, and Hana, very quietly, was no longer alone.
""".strip()

with Supertone(api_key=os.environ["SUPERTONE_API_KEY"]) as client:
    response = client.text_to_speech.create_speech(
        voice_id=VOICE_ID,
        text=SCRIPT,
        language="en",
        model="sona_speech_2",
        voice_settings={"pitch_variance": 0.9, "speed": 0.95},
    )

    with open("narration.wav", "wb") as f:
        f.write(response.result.read())

print("Saved narration.wav")

TypeScript

import { Supertone } from "@supertone/supertone";
import * as fs from "node:fs";

const VOICE_ID = "20160a4c5ba38967330c84"; // replace with your voice ID

const SCRIPT = `
Chapter one. The clocktower struck midnight, and the wind through the old
square carried whispers from the workshops below. Hana adjusted her coat,
checked the address one more time, and stepped through the iron gate.

Inside, the air smelled of copper and lemon polish. Rows of half-finished
automatons stared back from the shelves, each one waiting for a name. She
set her satchel on the bench and opened her notebook to a fresh page.

By dawn, the room had changed. One of the figures by the window was no
longer half-finished, and Hana, very quietly, was no longer alone.
`.trim();

const client = new Supertone({ apiKey: process.env.SUPERTONE_API_KEY });

const response = await client.textToSpeech.createSpeech({
  voiceId: VOICE_ID,
  apiConvertTextToSpeechUsingCharacterRequest: {
    text: SCRIPT,
    language: "en",
    model: "sona_speech_2",
    voiceSettings: { pitchVariance: 0.9, speed: 0.95 },
  },
});

if (response.result instanceof Uint8Array) {
  fs.writeFileSync("narration.wav", response.result);
} else if (response.result && "getReader" in response.result) {
  const reader = (response.result as ReadableStream<Uint8Array>).getReader();
  const chunks: Uint8Array[] = [];
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    if (value) chunks.push(value);
  }
  fs.writeFileSync("narration.wav", Buffer.concat(chunks));
}

console.log("Saved narration.wav");

内部での処理の流れ

SDKはSCRIPT.length > 300であることを検出します。
まず文の区切り記号で分割し、1文自体が長すぎる場合は単語境界でさらに分割します。
PythonのSDKは最大3つのcreate_speechリクエストを並列に実行し、TypeScriptのSDKは順次実行します。
各セグメントは完結したオーディオファイルとして返されます。
SDKは2つ目以降の各セグメントからWAVヘッダーを取り除き、バイト列を1つの連続したクリップに連結します。
結果として、単一セグメントのレスポンスと同じ形式の、そのまま再生できるファイルが1つ返ります。

ヒント

句読点が効きます。 適切に句読点が付いた原文ほど、きれいな切れ目で分割できます。スクリプトが機械翻訳や文字起こし由来の場合は、./?/!を補うと結果が改善します。
ボイス設定は引き継がれます。 同じvoice_settingsがすべてのセグメントに適用されるため、結合後の音声も一貫した印象になります。
事前に見積もりましょう。 predict_durationは自動チャンク分割を行いませんが、スクリプトを数文に分けてそれぞれpredict_durationを呼び出し、長さを合計することでコストを見積もれます。
適切なモデルを選んでください。 長文ナレーションではsona_speech_2が最も自然な発話を生成します。多数のナレーションを高速に生成する必要がある場合はsona_speech_2_flashに切り替えてください。

長文テキスト

300文字制限とチャンク分割の挙動に関する完全なリファレンスです。

ボイス設定

ナレーションの発話表現を細かく調整できます。

​Python

​TypeScript

​内部での処理の流れ

​ヒント

​関連情報

長文テキスト

ボイス設定

Python

TypeScript

内部での処理の流れ

ヒント

関連情報