Input

Language spoken in the audio, specify 'None' to perform language detection.
Number of parallel batches you want to compute. Reduce if you face OOMs.
Whisper supports both chunked as well as word level timestamps.
Login to run this model.
curl --location 'https://urlog.io/api/ai-speech-to-text' \
--header 'x-api-key: $URLOG_API_TOKEN' \
--header 'Content-Type: application/json' \
--data '{
    "file": "https://urlog.io/file/OSR_uk_000_0050_8k.wav",
    "num_speakers": 2,
    "language": "en"
}'