Standard Variables and Data Types

Standard Variables make recognizing common expressions and Data Types easy by parsing them to a normalized format.

Introduction

Learn more

This part of the documentation expands on what we’ve previously written about models, intents, and entities. The best way to start learning Speechly is by completing the Quick Start. You might want to learn about the SLU basics too.

The Speechly SLU applications are built by specifying a set of example utterances for which we use our Speechly Annotation Language (SAL). The example utterances should, as accurately as possible, reflect what your users might say to your application. Your examples are then fed as training data to a fairly complex machine learning system, which takes care of building all the bits and pieces required for a computer to understand human speech.

Standard Variables are building blocks that make supporting certain common but somewhat complex expressions in the Speechly SLU applications easier. While you could construct these same expressions yourself with the SAL, our predefined standard types let you focus more on the unique aspects of your application. The Standard Variables look like and are used like normal variables, but you don’t have to define them in your configuration, because we’ve already done it for you. The Standard Variables can be identified by their names, which start with SPEECHLY. You can see a standard variable being used in the example utterance below, which now permits various ways of expressing dates to be recognized, without having to define them individually.

*book book a flight for $SPEECHLY.DATE(departure)

Data Types determine what is done to an entity after it has been recognized. While the default Data Type String leaves the entities as they are recognized, the other Data Types, such as Date provide normalizations for the entities that make their further use easier. The Data Types are defined in the Speechly Dashboard when listing entities.

While the Standard Variables and Data Types can be used separately, the two features are best when combined. With the entity departure defined as Date in the example above, an utterance like “Book a flight for August ninth two thousand twenty” would be recognized, and the SLU API would return the recognition as:

intent: book
entity:
    name: "departure"
    value: "2020-08-09"

Now one neither has to make the effort to define how the dates look like nor determine how they map into a structured format.

Supported Standard Variables and the corresponding Data Types

Standard Variable Recognizes Examples
$SPEECHLY.DATE Arbitrary dates tomorrow, next Friday, January fifth twenty twenty
$SPEECHLY.NUMBER Arbitrary numbers five million five hundred twenty-eight thousand eight point twelve, minus zero point zero five, eleven thousand
$SPEECHLY.CARDINAL_NUMBER Arbitrary cardinals five million five hundred twenty-eight thousand eight, minus three, eleven hundred eleven
$SPEECHLY.SMALL_NUMBER Small numbers seventeen point five, minus five
$SPEECHLY.SMALL_CARDINAL_NUMBER Small cardinal numbers seventeen, minus five, ninety-five
$SPEECHLY.FOUR_DIGIT_NUMBER Four digit numbers five six four nine, one nine eight four
$SPEECHLY.POSITIVE_NUMBER Positive numbers eleven hundred eleven, seventeen, five million five hundred
$SPEECHLY.NEGATIVE_NUMBER Negative numbers minus five, negative twenty-four
$SPEECHLY.SMALL_ORDINAL_NUMBER Ordinal numbers 1-31 first, second, thirty-first
$SPEECHLY.IDENTIFIER_SHORT 1-4 character identifier zero zero seven x, alpha dash one, two seven
$SPEECHLY.IDENTIFIER_MEDIUM 5-8 character identifier a b one two dash nine x, delta foxtrot five seven dash two
$SPEECHLY.IDENTIFIER_LONG 9-12 character identifier one two seven dot zero dot zero dot one slash x y
$SPEECHLY.IDENTIFIER 1-12 character identifier two seven, one two seven dot zero dot zero dot one slash x y

Data Types

  • Date — expressions that define a date are converted into ISO-8601 as a string (e.g., January fifth twenty twenty2020-05-01). Relative expressions like tomorrow or next Friday are parsed relative to the current date. If the year is missing from the expression, the current year will be used.

  • Number normalizes all the number Standard Variables into digits (e.g., five six four nine5649, seventeen point five17.5, three hundred thousand300000, three quarters0.75).

  • Identifier should be used together with alphanumeric identifiers (sequences) that are spelled out one character at a time. Entities of this type are normalized into character sequences representing the identifier (e.g., zero zero seven x007x, one two seven dot zero dot zero dot one slash x y127.0.0.1/xy).

Dates

In many applications, you need to use dates and concepts such as tomorrow or today. While, theoretically, you could provide the model with thousands of examples of how the users may refer to certain times, there’s a simpler way.

If you use the Standard Variable $SPEECHLY.DATE, the model automatically understands dates and relative constructs that can be mapped into a certain date or month:

  • today
  • tomorrow
  • day after tomorrow
  • next Friday
  • next January

This allows the end-users to use any sensible way of referring to dates such as, July the fifth twenty-twenty or fifth of July.

When you also define the Data Type on the Speechly Dashboard as Date, the resulting entity is parsed as ISO-8601 — in the form of a date string (e.g., 2020-07-05).

Of course, it’s not always sensible to have all dates available in all applications. If your application supports a limited range of date expressions, you might want to add a set of examples in your configuration instead, like:

weekdays = [monday|tuesday|wednesday|thursday|friday]
*scheduling next week only $weekdays(available) is okay
*scheduling next week only $weekdays(available) [and|or] $weekdays(available) are okay

Numbers

Often SLU applications need to understand numbers, and thus Speechly supports several Standard Variables for recognizing them. In addition, the Speechly Annotation Language has the number range syntax amount = [1..20] (explained here) for defining custom number ranges easily.

  • $SPEECHLY.NUMBER is the most general of the number variables. It aims to support the widest range of numbers, for example, five million five hundred twenty-eight thousand eight, minus zero point zero five, and eleven hundred point sixteen.

  • $SPEECHLY.CARDINAL_NUMBER is otherwise similar in scope, but it does not support decimals or fractions. $SPEECHLY.POSITIVE_NUMBER and $SPEECHLY.NEGATIVE_NUMBER are subsets of $SPEECHLY.CARDINAL_NUMBER but strictly either positive or negative.

  • $SPEECHLY.SMALL_NUMBER recognizes small numbers, for example, seventeen point five and minus five, and it is optimized for number expressions smaller than two hundred. $SPEECHLY.SMALL_CARDINAL_NUMBER is otherwise similar in scope, but it does not support decimals or fractions.

  • $SPEECHLY.FOUR_DIGIT_NUMBER recognizes numbers that consist of four digits, for example, five six four nine.

  • $SPEECHLY.SMALL_ORDINAL_NUMBER recognizes ordinal numbers from 1 to 31, for example, “fifth” in fifth floor.

For the best results, you should choose the most specific Standard Variable for your use case.

All of these Standard Variables can be used with the Data Type Number, which parses the recognized expression as a string consisting of digits. For example, zero zero three five would be parsed as 0035, nineteen as 19, seventeen point five as 17.5, and three hundred thousand as 300000.

Alphanumeric identifiers

The value of some entities, such as license plates or product codes, can be in the form of a mixed sequence of letters, digits, and special characters. Speechly provides three Standard Variables that can be used to represent such identifiers. These are:

  • $SPEECHLY.IDENTIFIER_SHORT (for sequences of 1-4 characters)
  • $SPEECHLY.IDENTIFIER_MEDIUM (for sequences of 5-8 characters)
  • $SPEECHLY.IDENTIFIER_LONG (for sequences of 9-12 characters)

In addition to letters and digits, these sequences may contain the following symbols: # (hash), / (slash), - (dash), . (point), and , (comma).

These variables should be used with the Data Type Identifier.

*add_product add $SPEECHLY.SMALL_NUMBER(amount) units of $SPEECHLY.IDENTIFIER_MEDIUM(product_id)

The example utterance above — with the entity amount defined as the Data Type Number, and the entity product_id as the Data Type Identifier — would recognize the utterance “Add five units of a b one two three slash x," and parse amount with the value of 5 and product_id with the value of AB123/X.

Should your configuration include identifiers that are more than 12 characters in length, you can combine any of the abovementioned Standard Variables in the following fashion:

my_super_long_identifier = $SPEECHLY.IDENTIFIER_LONG $SPEECHLY.IDENTIFIER_LONG

Now, my_super_long_identifier supports identifiers that are up to 24 characters in length. However, do keep in mind that spelling out very long character sequences might not be that easy for your users.

Also, it is good to be aware that some letters and digits are phonetically very close to one another. For example, the letters A and H can be easily confused with the digit 8. To help avoid such problems, the Standard Variables that represent identifiers also have support for the International Radiotelephony Spelling Alphabet (also known as the NATO phonetic alphabet): Alfa, Bravo, Charlie, Delta, Echo, Foxtrot, Golf, Hotel, India, Juliett, Kilo, Lima, Mike, November, Oscar, Papa, Quebec, Romeo, Sierra, Tango, Uniform, Victor, Whiskey, X-ray, Yankee, and Zulu. Thereby, an identifier uttered using the NATO phonetic alphabet, for instance, “charlie delta echo bravo one point five," would thus be parsed as CDEB1.5.


Profile image for karoliina-louhema

Last updated by karoliina-louhema on September 17, 2020 at 14:41 +0300

Found an error on our documentation? Please file an issue or make a pull request