How Does Talkifier Studio SSML Work?

With SSML it is possible to customize the generated language. For example, you can specify details about pauses and audio formatting for acronyms, dates, times, abbreviations or text to be censored. By default SSML is disabled but you can disable it by simply clicking the SSML checkbox at the bottom of the editor. To demonstrate this in an example, open Talkifier Studio and visit the editor.



To add SSML to any text simply highlight the text and click on any of the SSML tag buttons.



The <break> element

There you enter the following text as shown in the screenshot:

This is a pause <break time="1s"/> and now I'll continue.

As you can see here, the break element inserts a break of 1 second. It would also be possible to insert a pause with SSML in milliseconds, e.g. 500ms. Normally, the `' element is used for the output with SSML; this is not necessary in Talkifier Studio..

The <say-as> element

Use this element to specify information about the type of text construction contained in the element. This also allows you to determine the level of detail of the representation of the text contained in the element. The <say-as> element has the required interpret-as attribute, which determines the pronunciation of the value. Depending on the value in interpret-as, you can use the optional attributes format and detail.

The following example is spoken as an integer:

<say-as interpret-as="cardinal">12345</say-as>

The following example is spoken as "First":

<say-as interpret-as="ordinal">1</say-as>

The following example is spoken as "C A N" (English):

<say-as interpret-as="characters">can</say-as>

In the following example, a beep is emitted as for censoring:

<say-as interpret-as="expletive">censor this</say-as>

Adjusts units to the number when distinguishing between singular or plural. The following example is spoken as "10 feet":

<say-as interpret-as="unit">20 foot</say-as>

The following example is spoken letter by letter (in English)

<say-as interpret-as="verbatim">abcdefg</say-as>

The following example is spoken as "The tenth of September, nineteen sixty":

<say-as interpret-as="date" format="yyyymmdd" detail="1"> 1960-09-10 </say-as>

The following example is spoken as "The tenth of September":

<say-as interpret-as="date" format="dm">10-9</say-as>

The following example is spoken as "Two thirty P.M.":

<say-as interpret-as="time" format="hms12">2:30pm</say-as>

These were examples of how numbers can be pronounced differently. The following options are available as parameters for the attribute 'interpret-as':

  • cardinal
  • ordinal
  • characters
  • fraction
  • expletive / bleep
  • unit
  • verbatim / spell-out
  • date
  • time
  • telephone

The <prosody> element

This adjusts the pitch, speaking rate and volume for the text in the element. The attributes rate, pitch and volume are currently supported.

The <emphasis> element

This is used to emphasize the text of the element or remove the emphasis. With the element <emphasis> you change the language similar to <prosody>, but without having to specify individual language attributes.

The level attribute can have the following values:

  • strong
  • moderate
  • none
  • reduced
This was an excerpt of the most common SSML elements.

Follow US

Get newest information from our social media platform