API Reference

API reference for the Speechly API

Description

Speechly SLU service provides spoken language understanding in a bidirectional stream. It takes SLURequest stream as an input and outputs SLUResponse stream in real time.

A new stream is started by initializing the SLU engine with a config value.

Speechly API requires that the user has an access token from Identity service. The token must be included in the metadata as Authorization key with value Bearer TOKEN_HERE.

When a new SLU.Stream stream is started, the client must first send the config value, which configures the SLU engine. If it is not the first message sent, the stream will close with an error.

When the client sends SLUevent.START message to start the audio stream and starts sending the data, server strats responding with SLUResponse messages until the stream is ended by client sending SLUEvent.STOP.

Stream requires config

If the first message to SLU service is not config the stream closes with an error. You’ll need a Speechly app ID for creating the configuration. You can get your app id by signing up to the Speechly Dashboard

Identity service

service Identity {
    rpc Login(LoginRequest) returns (LoginResponse) {}
}

Identity service provides client login. When successful, it returns an access token to access SLU and WLU services.

LoginRequest

message LoginRequest {
    string device_id = 1;
    string app_id = 2;
}
Name Type Description
intent string An unique identifier for the end user. Required.
intent string Application ID registered with Speechly. Required.

Example


const identity = new Speechly.v1.Identity(host, credentials);
identity.login({ appId, deviceId }, (err, response) => {
  if (err) {
    return reject(err);
  }
  const token = response.token;
});

LoginResponse

message LoginResponse {
    string token = 1; 
}
Name Type Description
intent string An access token to be used for SLU and WLU services.

SLU service

service SLU {
    rpc Stream(stream SLURequest) returns (stream SLUResponse) {}
}
const metadata = new grpc.Metadata();
metadata.add("Authorization", `Bearer ${token}`);
const client = new Speechly.v1.SLU(host, credentials);
const slu = client.Stream(metadata);

Messages

SLURequest

message SLURequest {
    oneof streaming_request {
        SLUConfig config = 1;
        SLUEvent event = 2;
        bytes audio = 3;
    }
}
Name Type Description
config SLUConfig Provides the configuration for initializing the SLU service
event SLUEvent Either ´START´ or ´STOP´ to begin or end the stream
audio bytes The actual audio stream

Example CHECK THIS!

slu.write({ config: { channels: 1, sampleRateHertz: 16000 } });
slu.write({ event: { event: "START" } });
const audio = new wav.Reader();
audio.on("data", audioData => {
  if (slu.writable) {
    slu.write({ audio: audioData });
  }
});
slu.on("data", data => {
  console.dir(data, { depth: null });
});
audio.on("end", () => {
  slu.write({ event: { event: "STOP" } });
  slu.end();
});
fs.createReadStream("../audio.wav").pipe(audio);

See complete example.

SLUConfig

message SLURequest {
    enum Encoding {
        LINEAR16 = 0; 
    }
    Encoding encoding = 1;
    int32 channels = 2;
    int32 sample_rate_hertz = 3;
    string language_code = 4;
  }
}
Name Type Description
encoding Encoding Choice of audio encoding as an object, ´LINEAR16 = 0´ is for raw, linear 16-bit PCM audio
channels int32 The amount of channels in the audio stream, must be at least 1
sample_rate_hertz int32 The sampling rate of the audio stream. 16000Hz is preferred, must be at least 8000Hz
language_code string A valid language code, such as `en_US’ | required, must match one of the languages defined in the appId configuration|

SLUEvent

SLUEvent is a control event sent by the client in the SLU.Stream RPC

message SLUEvent {
    enum Event {
        START = 0; 
        STOP = 1; 
    }
    Event event = 1;
}
Name Type Description
enum Event Either START or STOP to start and stop the audio stream.

SLUResponse

message SLUResponse {
    string audio_context = 1;  
    int32 segment_id = 2;
    oneof streaming_response {
        SLUTranscript transcript = 3;
        SLUEntity entity = 4;
        SLUIntent intent = 5;
        SLUSegmentEnd segment_end = 6;

        SLUTentativeTranscript tentative_transcript = 7;
        SLUTentativeEntities tentative_entities = 8;
        SLUIntent tentative_intent = 9;

        SLUStarted started = 10;
        SLUFinished finished = 11;
    }
}
Name Type Description
audio_context string Identifier to match server response to an audio context.
segment_id int32 Identifier to match server reponse to segment context.
streaming_response oneof One of nine possible streaming responses from the server.
transcript SLUTranscript The final transcript of the utterance, once the segment is finished.
entity SLUEntity A final entity from the segment.
intent SLUIntent The final intent of the segment.
tentative_transcript SLUTentativeTranscript Tentative transcript of the utterance, sent continuosly while the segment is being processed, subject to change.
tentative_entities SLUTentativeEntities Tentative entities for the utterance, sent continously while the segment is being processed, subject to change.
tentative_intent SLUIntent Tentative intent of the utterance. Subject to change.
started SLUStarted Sent when the server is ready to receive audio stream.
finished SLUFinished Sent when the server has closed the audio stream.

Responses sent by the server in the SLU.Stream RPC when it is receiving audio stream. The stream always starts with SLUStarted and ends to either error or SLUFinished

Actual responses are either tentative or final. Tentative responses are subject to change until they are finished and should be discarded once the server has sent the final response.

SLUTentativeTranscript

message SLUTentativeTranscript {
    string tentative_transcript = 1;
    repeated SLUTranscript tentative_words = 2;
}
Name Type Description
tentative_transcript string Tentative transcript for the utterance.
tentative_words SLUTranscript An array of words in the transcript.

Message sent by the server for tentative transcript of the voice data, before the SLU stream is finished either by client sending SLUEvent.STOP or an error.

Tentative results can change

Tentative results are subject to change until the SLU stream is finished.

SLUTentativeEntities

message SLUTentativeEntities {
    repeated SLUEntity tentative_entities = 1;
}
Name Type Description
tentative_entities SLUEntity An array of entities in the transcript.

Message sent by the server for tentative entitites for the utterance.

Tentative results can change

Tentative results are subject to change until the SLU stream is finished.

SLUEntity

message SLUEntity {
    string entity = 1;
    string value = 2;
    int32 start_position = 3;
    int32 end_position = 4;
}
Name Type Description
tentative_transcript string An entity from the utterance.
value string A value for the entity.
start_position int32 Starting position of the entity in the transcript. Inclusive.
end_position int32 Ending position of the entity in the transcript. Exclusive.

Message sent by the server for the final entity results.

SLUIntent

message SLUIntent {
    string intent = 1;
}
Name Type Description
intent string Intent of the segment.

Message sent by the server for the final intent results.

SLUSegmentEnd

message SLUSegmentEnd {
}

Message sent by the server in the end of the SLU segment.

SLUStarted

message SLUStarted {
}

Message sent by the server when the audio context is initialized. When it is initialized, the server sends SluStarted message, which contains an audio_context for matching the rest of the response messages to that specific utterance.

As the audio is processed, the server sends SLUTentativeEvents messages continuously.

SLUFinished

message SLUFinished {
    // If the audio context finished with an error, then this field
    // contains a value.
    SLUError error = 2;
}

Message sent by the server when the audio context is finished either by client sending SLUEvent.STOP or an error. If the context is finished due to error, SLUError contains the error message.

SLUError

message SLUError {
    string code = 1; 
    string message = 2; 
}
Name Type Description
code string Short code for the error
message string Human readable error message.

Example code

JavaScript

const fs = require("fs");
const protoLoader = require("@grpc/proto-loader");
const grpc = require("grpc");
const wav = require("wav");

const appId = process.env.APP_ID;
if (appId === undefined) {
  throw new Error("APP_ID environment variable needs to be set");
}

let host = "api.speechgrinder.com";
let credentials = grpc.credentials.createSsl();

const SgGrpc = grpc.loadPackageDefinition(
  protoLoader.loadSync("../sg.proto", {
    keepCase: false,
    longs: String,
    enums: String,
    defaults: true,
    oneofs: true
  })
);

const login = (deviceId, appId) => {
  return new Promise((resolve, reject) => {
    const identity = new SgGrpc.speechgrinder.sgapi.v1.Identity(host, credentials);
    identity.login({ appId, deviceId }, (err, response) => {
      if (err) {
        return reject(err);
      }
      return resolve(response.token);
    });
  });
};

const start = slu => {
  const audio = new wav.Reader();

  slu.write({ config: { channels: 1, sampleRateHertz: 16000 } });
  slu.write({ event: { event: "START" } });

  audio.on("data", audioData => {
    if (slu.writable) {
      slu.write({ audio: audioData });
    }
  });
  slu.on("data", data => {
    console.dir(data, { depth: null });
  });
  audio.on("end", () => {
    slu.write({ event: { event: "STOP" } });
    slu.end();
  });
  fs.createReadStream("../audio.wav").pipe(audio);
};

Promise.resolve()
  .then(() => login("node-simple-test", appId))
  .catch(err => {
    console.error(err);
    process.exit();
  })
  .then(token => {
    const metadata = new grpc.Metadata();
    metadata.add("Authorization", `Bearer ${token}`);
    const client = new SgGrpc.speechgrinder.sgapi.v1.Slu(host, credentials);
    const slu = client.Stream(metadata);
    return start(slu);
  })
  .catch(err => {
    console.error(err);
  });

  

Python

import os
import wave
import uuid

import grpc

from speechly_pb2 import SLURequest, SLUConfig, SLUEvent, LoginRequest
from speechly_pb2_grpc import IdentityStub as IdentityService
from speechly_pb2_grpc import SLUStub as SLUService

chunk_size = 8000

def audio_iterator():
    yield SLURequest(config=SLUConfig(channels=1, sample_rate_hertz=16000))
    yield SLURequest(event=SLUEvent(event='START'))
    with wave.open('output.wav', mode='r') as audio_file:
        audio_bytes = audio_file.readframes(chunk_size)
        while audio_bytes:
            yield SLURequest(audio=audio_bytes)
            audio_bytes = audio_file.readframes(chunk_size)
    yield SLURequest(event=SLUEvent(event='STOP'))

with grpc.secure_channel('api.speechly.com', grpc.ssl_channel_credentials()) as channel:
    token = IdentityService(channel) \
        .Login(LoginRequest(device_id=str(uuid.uuid4()), app_id=os.environ['APP_ID'])) \
        .token

with grpc.secure_channel('api.speechly.com', grpc.ssl_channel_credentials()) as channel:
    slu = SLUService(channel)
    responses = slu.Stream(
        audio_iterator(),
        None,
        [('authorization', 'Bearer {}'.format(token))])
    for response in responses:
        print(response)

WLU service

Speechly written language understanding service provides natural language understanding for written languages. It uses the same model than the SLU service and returns same results for same transcripts.

When the service is sent a string, it returns with intents, entities and transcripts for each segment in the string as a response. The maximum size for the message is 16KB.

service WLU { 
    rpc Text(WLURequest) returns (WLUResponse) {}
}

Messages

WLURequest

message WLURequest {
    string language_code = 1;
    string text = 2;
}
Name Type Description
language_code string A valid language for the model used in the application.
text string The text from which the intents, entities and transcripts are extracted from.

WLUResponse

message WLUResponse {
    repeated WLUSegment segments = 1;
}
Name Type Description
segments WLUSegment Any number of segments extracted from the string sent in WLURequest.

WLUSegment

message WLUSegment {
    string text = 1;
    repeated WLUToken tokens = 2;
    repeated WLUEntity entities = 3;
    WLUIntent intent = 4;
}
Name Type Description
text string A segment extracted from WLURequest.
token WLUToken Any number of words extracted from the segment.
entities WLUEntity Any number of entities extracted from the segment.
intent WLUIntent The intent of the segment.

WLUToken

message WLUToken {
    string word = 1;
    int32 index = 2;
}
Name Type Description
word string One word from the segment.
index int32 Position of the word in the segment.

WLUEntity

message WLUEntity {
    string entity = 1;
    string value = 2;
    int32 start_position = 3;
    int32 end_position = 4;
}
Name Type Description
entity string An entity extracted from the segment.
value string The value for the entity.
start_position int32 Starting position of the words containing the entity and its value. Inclusive.
end_position int32 Ending position for the word(s) containing the entity and its value. Exclusive.

WLUIntent

message WLUIntent {
    string intent = 1;
}
Name Type Description
intent string The intent of the segment.

Profile image for ottomatias

Last updated by ottomatias on March 2, 2020 at 13:47 +0200

Found an error on our documentation? Please file an issue or make a pull request