Whether you prefer articles read to you while you do something else or trying to grasp a new foreign language or even for specially abled students, TTS (short for text-to-speech) has proved to be very useful.
First things first: Where can I download this? — See the download-link below. MeSpeak.js (modulary enhanced speak.js) is a 100% client-side JavaScript text-to-speech library based on the speak.js project, a port of the eSpeak speech synthesizer from C to JavaScript using Emscripten. MeSpeak.js adds support for Webkit and Safari and introduces loadable voice modules. Jul 19, 2016 Follow this link: MEGA VOICE COMMAND LINKS - Mark II to download the application. Once downloaded double click this icon: Links Mark II will beging searching our servers for the content. A message will pop out notifying that the publisher is not verified. We are working on removing the warning and be a verified publisher. On your Mac, choose Apple menu System Preferences, click Accessibility, then click Speech. Open the Speech pane for me. Click the System Voice pop-up menu, then choose a voice. To adjust how fast the voice speaks, drag the Speaking Rate slider. Click Play to test the voice and speaking rate. Jun 01, 2020 To find a complete list of voice commands, click the icon with three horizontal lines (☰) in the circular tab in the upper-right corner. You can also use that menu to add your own voice commands. Some commands you can say include: 'Jarvis, open Google.' 'Jarvis, play music'. 'Jarvis, what's the weather.' 'Jarvis, get new email.' Ever since we saw the first Iron Man movie, geeks have been mesmerized with Iron Man technology. One technology in particular has captured people's attention more than most, and that is Tony Stark's impressive J.A.R.V.I.S. Voice recognition system. Stark can pretty much do anything through it. It has been said that such a system simply couldn't be created with today's technology, but as it.
In our previous articles we saw some of the best text to speech apps for Windows and Android. And today, we take a look at some of the best TTS options available for Macs.
Text To Speech For Mac
1. macOS TTS
Download Jarvis Voice Recognition System
Before we get too ahead of ourselves and start downloading third party apps, it is very trivial to know that macOS itself comes with a built-in TTS and you can use it anywhere on your computer from the Notes app to any browser.
To get started, highlight or select the text which you want to be read and then right click, go to Speech and then to Start Speaking; and your Mac should start reading the text to you. It also supports a lot of other languages other than English and there are a lot of voices to choose from in all the languages. To change the language option simply go to Accessibility > Speech. Although some voices are very robotic, there are a few which aren’t and sound more like a human.
But the TTS is far from perfect; it is very basic and barebones and lacks options like pause/play, picking up from a selected word instantly and a lot more.
Quick Tip: It blew my mind and might even blow yours to know that the native TTS on Mac also supports converting your text into audio files. Just select the required text, right click and go to Services > Add to iTunes as a spoken track. The text will be converted to an audio track and added to your iTunes library.
Pros:
– Built-in system wide
– Lots of voice options
– Converting text to iTunes track
– Built-in system wide
– Lots of voice options
– Converting text to iTunes track
Cons:
– No Pause/Play
– Have to select manually all the words to be read
– No instant pickup
– No Pause/Play
– Have to select manually all the words to be read
– No instant pickup
Verdict:
Overall, the TTS that comes with macOS is very barebones without all the bells and whistles and should be perfect for somebody looking for a basic TTS experience without even buying or installing any third party software.
Overall, the TTS that comes with macOS is very barebones without all the bells and whistles and should be perfect for somebody looking for a basic TTS experience without even buying or installing any third party software.
2. Invicta TTS
Invicta TTS is a very simple free Text To Speech app available on the Mac App Store.
Once you open up the app, it presents you with a text box where you can enter or paste any text which will be then converted to speech. The app is very lightweight and minimal in nature with everything being to the point.
Once you open up the app, it presents you with a text box where you can enter or paste any text which will be then converted to speech. The app is very lightweight and minimal in nature with everything being to the point.
Although the app is very basic, unlike the built in TTS of Mac OS, it does add the option of playing or pausing the audio which becomes crucial when listening to long texts or articles. The voice settings cannot be changed but the in built voice does the job pretty good enough.
Pros:
– Minimal and Light
– Play/Pause Option
– Minimal and Light
– Play/Pause Option
Cons:
– Cannot read documents automatically
– Supports only English
– Cannot read documents automatically
– Supports only English
Verdict:
If you need a simple and light TTS app and might be listening to long articles, Invicta TTS does the job pretty well but do remember that it can only read English.
If you need a simple and light TTS app and might be listening to long articles, Invicta TTS does the job pretty well but do remember that it can only read English.
Link: Get Invicta TTS on the App Store
Price: Free
3. Natural Reader
The next app on our list is Natural Reader which is an extremely powerful TTS software available not only on Mac OS but also on Windows, iOS, Android and even has an online reader.
The app comes in many flavours, each with its fair share of features for the price. The free version comes with basic TTS features along with the ability to read directly from file formats such as Docx, PDF, ePub and Txt. It also has a floating bar which can be used to read text while you are in other applications. The next option or the Personal version, at a steep $100, allows you to read web pages directly, converting text to audio files and syncing everything between your phone apps. There are also Professional and Ultimate versions which add OCR support and a bunch of natural voices.
Pros:
– Support for file formats
– Convert to audio files
– Cross Platform
– OCR Support
– Support for file formats
– Convert to audio files
– Cross Platform
– OCR Support
Cons:
– Pricey
– No instant pickup
– Pricey
– No instant pickup
Verdict:
All the features of Natural Reader definitely come at a price and you should be able to decide whether it is a suit for you with respect to your investment in TTS, but even for a casual user the free version works really well. Overall, Natural reader is not just best text to speech software with natural voices, but since it also support PDF, it’s also a good option for those who are looking for PDF Voice Reader for macOS.
All the features of Natural Reader definitely come at a price and you should be able to decide whether it is a suit for you with respect to your investment in TTS, but even for a casual user the free version works really well. Overall, Natural reader is not just best text to speech software with natural voices, but since it also support PDF, it’s also a good option for those who are looking for PDF Voice Reader for macOS.
Pricing Options: Pricing for Natural Reader
Link: Download Natural Reader from here
4. Read Aloud
![Voice Voice](https://www.besttechie.com/wp-content/uploads/2016/10/jarvis-voice-ai.jpg)
![Jarvis voice command for pc Jarvis voice command for pc](https://img07.deviantart.net/7437/i/2014/006/0/8/jarvis_sound_scheme_real_jarvis_voice__by_bjayn-d7142vx.png)
Read Aloud is not exactly a stand alone Mac app but instead a Chrome extension which might appeal to some people. Considering how many posts and articles are read on the internet everyday, we had to include Read Aloud.
It is completely free and once you install it, its icon will appear in the extension bar which you can now use to read any webpage or any online article, just by a single click. When it is in work, you get a play/pause button along with a forward or rewind button which can be used to advance or backtrack paragraphs. Considering it is free, the voice options are really good and feel very natural and premium.
Pros:
– Great natural voice
– Forward or rewind by paragraphs
– Listen to webpages
– Great natural voice
– Forward or rewind by paragraphs
– Listen to webpages
Cons:
– Works only on Chrome
– Works only on Chrome
Verdict:
Suggesting Read Aloud is very straight forward; if you are someone who reads a lot on the internet and are looking for a free TTS software for that, nothing beats Read Aloud.
Suggesting Read Aloud is very straight forward; if you are someone who reads a lot on the internet and are looking for a free TTS software for that, nothing beats Read Aloud.
Price: Free
Link: Download Read Aloud from the Chrome Store
5. Capti Voice
Capti Voice is probably the most polished and well rounded TTS software available for the Mac and the award are only there to justify that. Starting off, Capti Voice uses your browser for the app to function instead of a stand alone Mac application. Don’t worry, you can still use it while you are offline as it stores all its data locally and personally I have had no issues.
Capti Voice has a subscription based model and even the free version has a lot to offer from various file format supports to text search while the premium versions add features like creating playlists, OCR Support and intelligent dictionary lookup. The voices offered across all the platforms are very high quality and commendable.
Quick Tip: Don’t forget to use the Chrome extension which allows you to save articles or webpages to be read later by Capti Voice.
Pros:
– Cross platform with mobile apps
– Create Playlists
– Dictionary lookup
– Shortcuts to get around
– Cross platform with mobile apps
– Create Playlists
– Dictionary lookup
– Shortcuts to get around
Cons:
– No standalone app
– Syncs only when you add to cloud storage
– No standalone app
– Syncs only when you add to cloud storage
Verdict:
Overall, Capti Voice is a really compelling app with features packed to the brim and is very similar to natural Reader but with a subscription based model. It is really the best TTS experience you can get on Mac OS.
Overall, Capti Voice is a really compelling app with features packed to the brim and is very similar to natural Reader but with a subscription based model. It is really the best TTS experience you can get on Mac OS.
Jarvis Voice For Windows 10
Pricing Info: Pricing Options for Capti Voice
Link: Download Capti Voice from here
Jarvis Voice Command For Pc
6. Honorable Mentions
CereProc has some of the most natural sounding computer speeches available on the market, which you can use to replace the default voice on your Mac (also available for other platforms). There are a lot of high quality voice packs to choose from and each costs around $35.
Zamzar is a free online service which you can use to convert your text to audio files or mp3s. Unlike the iTunes spoken track which you can use only on Apple devices, you can use it on any platform without any hassle.
Wrapping up: Best Text to Speech for Mac
So these were some of the TTS software available on the Mac and we hope we made your decision a little bit easier. If you are someone who reads mainly on the internet, Read Aloud is by far the best free option. Although a little limited, the built-in TTS feature seems to work just fine, but it can be a pain for long stretches of texts or long articles for which there is Invicta which is also free.
Natural Reader and Capti Voice both are spectacular TTS apps with a lot of plans to choose from, but I guess what it comes down to is the paying model. Natural reader is a one time purchase and should be better if you feel you will be invested in TTS for a long time whereas Capti Voice follows a subscription based model also with a one week free trial. Thanks for reading and do comment below which one is your favorite TTS app on Mac OS.
Jarvis Voice For Siri
Read: Make your Devices Read Out Text, With Text to Speech
Text-To-Speech on the Web
First things first: Where can I download this? — See the download-link below.
About
meSpeak.js (modulary enhanced speak.js) is a 100% client-side JavaScript text-to-speech library based on the speak.js project, a port of the eSpeak speech synthesizer from C++ to JavaScript using Emscripten.
meSpeak.js adds support for Webkit and Safari and introduces loadable voice modules. Also there is no more need for an embedding HTML-element. Separating the code of the library from voice definitions should help future optimizations of the core part of speak.js. All separated data has been compressed to base64-encoded strings from the original binary files to save some bandwidth (compared to JS-arrays of raw 8-bit data). All separated data has been compressed to base64-encoded strings from the original binary files to save some bandwidth (compared to JS-arrays of raw 8-bit data).
Browser requirements: Firefox, Chrome/Opera, Webkit, and Safari (MSIE11 is expected to be compliant).
meSpeak.js 2011-2020 by Norbert Landsteiner, mass:werk – media environments; https://www.masswerk.at/mespeak/
meSpeak.js adds support for Webkit and Safari and introduces loadable voice modules. Also there is no more need for an embedding HTML-element. Separating the code of the library from voice definitions should help future optimizations of the core part of speak.js. All separated data has been compressed to base64-encoded strings from the original binary files to save some bandwidth (compared to JS-arrays of raw 8-bit data). All separated data has been compressed to base64-encoded strings from the original binary files to save some bandwidth (compared to JS-arrays of raw 8-bit data).
Browser requirements: Firefox, Chrome/Opera, Webkit, and Safari (MSIE11 is expected to be compliant).
meSpeak.js 2011-2020 by Norbert Landsteiner, mass:werk – media environments; https://www.masswerk.at/mespeak/
GNU General Public License
The eSpeak text-to-speech project is licensed under version 3 of the GNU General Public License.
Since meSpeak.js incorporates eSpeak, the same license (GPL v.3) applies.
The eSpeak text-to-speech project is licensed under version 3 of the GNU General Public License.
Since meSpeak.js incorporates eSpeak, the same license (GPL v.3) applies.
Important Changes:
v 2.0Major Upadate — Introducing a web worker for rendering the audio concurrently (outside the UI thread), reduced file size, basic audio filtering and stereo panning, and a new, simplified loading scheme for loading voice/language definitions.
v 2.0.1 Added meSpeak.getAudioAnalyser(), because, why not?
v 2.0.2 Disabled workers on mobile diveses.
v 2.0.3 Changed implementation of meSpeak.getAudioAnalyser().
v 2.0.4 Added a simple mobile unlocker (initial touchstart event handler).
(v. 2.0.5 Added the original eSpeak license statement.)
v 2.0.6 Added a workaround an issue with some browsers after the 80th call.
v 2.0.7 Added audio unlocking for Safari desktop browsers.
v 2.0.1 Added meSpeak.getAudioAnalyser(), because, why not?
v 2.0.2 Disabled workers on mobile diveses.
v 2.0.3 Changed implementation of meSpeak.getAudioAnalyser().
v 2.0.4 Added a simple mobile unlocker (initial touchstart event handler).
(v. 2.0.5 Added the original eSpeak license statement.)
v 2.0.6 Added a workaround an issue with some browsers after the 80th call.
v 2.0.7 Added audio unlocking for Safari desktop browsers.
Some real world examples (at masswerk.at):
• Explore client-side speech I/O with E.L.I.Z.A. Talking
• Celebrating meSpeak.js v.1.5: JavaScript Doing The JavaScript Rap (featuring MC meSpeak)(a heavy performance test)
• Celebrating meSpeak.js v.2.0: MeSpeak.js Stereo Panning Demo(reading a dialog by distributed roles)
• Audio Anaylser Demo, a simple oscilloscope display for meSpeak.js.
• Explore client-side speech I/O with E.L.I.Z.A. Talking
• Celebrating meSpeak.js v.1.5: JavaScript Doing The JavaScript Rap (featuring MC meSpeak)(a heavy performance test)
• Celebrating meSpeak.js v.2.0: MeSpeak.js Stereo Panning Demo(reading a dialog by distributed roles)
• Audio Anaylser Demo, a simple oscilloscope display for meSpeak.js.
New in MeSpeak 2.0
- MeSpeak now runs a worker in order to render any utterances, if available. (Otherwise, the core application is started in a single-threaded instance to maintain compatibility with older clients.) This means meSpeak.js will generally not block the UI thread and will also be precessing faster. Moreover, the filesize has been reduced (< 500K g-zipped)
Please mind that workers are disabled for mobile devices. (Since there is no user interaction as the sound arrives from the worker on a postMessage event, the playback would be muted.) - As a result, meSpeak.js now consists of two files, the fornt-end “
mespeak.js
” and the core application “mespeak-core.js
”, which will be loaded automatically by the front-end. (You still have to include “mespeak.js
” onyl, just as before.) - There are three major changes, two of which may concern compatibility:
- A standard configuration is now included. Meaning, there is no need to call “
meSpeak.loadConfig()
” (which now does nothing) or checkingmeSpeak.isConfigLoaded()
(which now returns alwaystrue
.)
However, there's now “meSpeak.loadCustomConfig()
” to override the standard configuration. - Voice files are now loaded relative to the script (instead of relative to the embedding page)!
Also, you may now just specify a voice-ID and the respective JSON-file will be loaded from the directory “voices
” in the same path as the application. - In order to export a) and special characters. Default text-encoding is UTF-8 (see the option 'utf16' for other).options (eSpeak command-options):* amplitude: How loud the voice will be (default: 100)* pitch: The voice pitch (default: 50)* speed: The speed at which to talk (words per minute) (default: 175)* voice: Which voice to use (default: last voice loaded or defaultVoice, see below)* wordgap: Additional gap between words in 10 ms units (default: 0)* variant: One of the variants to be found in the eSpeak-directory '~/espeak-data/voices/!v' Variants add some effects to the normally plain voice, e.g. notably a female tone. Valid values are: 'f1', 'f2', 'f3', 'f4', 'f5' for female voices 'm1', 'm2', 'm3', 'm4', 'm5', 'm6, 'm7' for male voices 'croak', 'klatt', 'klatt2', 'klatt3', 'whisper', 'whisperf' for other effects. (Using eSpeak, these would be appended to the '-v' option by '+' and the value.) Note: Try 'f2' or 'f5' for a female voice.* linebreak: (Number) Line-break length, default value: 0.* capitals: (Number) Indicate words which begin with capital letters. 1: Use a click sound to indicate when a word starts with a capital letter, or double click if word is all capitals. 2: Speak the word 'capital' before a word which begins with a capital letter. Other values: Increases the pitch for words which begin with a capital letter. The greater the value, the greater the increase in pitch. (eg.: 20)* punct: (Boolean or String) Speaks the names of punctuation characters when they are encountered in the text. If a string of characters is supplied, then only those listed punctuation characters are spoken, eg. { 'punct': '.,;?' }.* nostop: (Boolean) Removes the end-of-sentence pause which normally occurs at the end of the text.* utf16: (Boolean) Indicates that the input is UTF-16, default: UTF-8.* ssml: (Boolean) Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. (A small set of HTML is supported too.)further options (meSpeak.js specific):* volume: Volume relative to the global volume (number, 0..1, default: 1) Note: the relative volume has no effect on the export using option 'rawdata'.* log: (Boolean) Logs the compiled eSpeak-command to the JS-console.* pan: (Number) Stereo panning, -1 >= pan <= 1 -1 represents the extreme left 1 represents the extreme right 0 center (no effect) This option is available only with clients supporting the Web Audio API.* rawdata: Do not play, return audio data (wav) in callback. (A callback, see below, has to be specified in order to retrieve the data stream.) The type of the returned data is derived from the value (case-insensitive) of 'rawdata': - 'base64': returns a base64-encoded string. - 'mime': returns a base64-encoded option. Defaults to ArrayBuffer (uint8). If the resulting sound is stopped by meSpeak.stop(), the success-flag will be set to false. (A callbak may be also specified as a property of the options object. If both are present, the callback argument takes precedence.)Returns:* a 32bit integer ID greater than 0 (or 0 on failure). The ID may be used to stop this sound by calling meSpeak.stop(<id>).meSpeak.loadVoice('voices/fr.json', userCallback);meSpeak.loadVoice('en/en-us', userCallback);// userCallback is an optional callback-handler. The callback will receive two arguments:// * a boolean flag for success// * either the id of the voice, or a reason for errors ('network error', 'data error', 'file error')Note: Starting with meSpeak.js 2.0, voices are loaded relative to meSpeak.js.Also, if you just specify a voice-id, meSpeak.js will now try to load a respective voice from adirectory 'voices' in the same directory as the script.e.g., loadVoice('fr') will load '/path/to/mespeak/voices/fr.json', loadVoice('en/en-us') will load 'path/to/mespeak/voices/en/en-us.json'.A newly loaded voice will always become the new default voice: meSpeak.loadVoice('fr'); alert(meSpeak.getDefaultVoice()); // 'fr'meSpeak.setDefaultVoice('de');Sets the default voice to the voice with the voice with the id specified.(Note: If not explicitly set the default voice is always the the last voice loaded.)if (meSpeak.isVoiceLoaded('de')) meSpeak.setDefaultVoice('de');Check, if a voice has been successfully loaded.meSpeak.loadConfig()meSpeak.isConfigLoaded()Legacy methods. A standard configuration is now included in meSpeak.js.meSpeak.loadConfig() does nothingmeSpeak.isConfigLoaded() returns always trueHowever, you can still load a custom configuration usingmeSpeak.loadCustomConfig(url, callback)As with vocies, config-files will be loaded relative to the mespeak.js script.An optional callback will have two arguments, a boolean success flag and a message stringreporting any reasons for failing the operation.A custom congiguration may include just some of the eSpeak config-files.Any files found, will overwrite the standard configurations.meSpeak.setVolume(0.5);meSpeak.setVolume( volume [, id-list] );Sets a volume level (0 <= v <= 1)* if called with a single argument, the method sets the global playback-volume, any sounds currently playing will be updated immediately with respect to their relative volume (if specified).* if called with more than a single argument, the method will set and adjust the relative volume of the sound(s) with corresponding ID(s).Returns: the volume provided.alert(meSpeak.getVolume()); // 0.5meSpeak.getVolume( [id] );Returns a volume level (0 <= v <= 1)* if called without an argument, the method returns the global playback-volume.* if called with an argument, the method will return the relative volume of the sound with the ID corresponding to the first argument. if no sound with a corresponding ID is found, the method will return 'undefined'.var browserCanPlayWavFiles = meSpeak.canPlay(); // test for compatibilitymeSpeak.play( stream [, relativeVolume [, callback[, id[, pan]]]] );Play (cached) audio streams (using any of the export formats, ArrayBuffer, array, base64, dta-URL)Arguments:stream: A stream in any of the formats returned by meSpeak.play() with the 'rawdata'-option.volume: (optional) Volume relative to the global volume (number, 0..1, default: 1)callback: (optional) A callback function to be called after the sound output ended. The callback will be called with a single boolean argument indicating success. If the sound is stopped by meSpeak.stop(), the success-flag will be set to false. (See also: meSpeak.speak().)id: (optional, Number) An id to be used (default 0 => ignored.) meSpeak.play(myAudio, 1, null, mySoundId); meSpeak.stop(mySoundId);pan: (optional, Number) Stereo panning. (left) -1 >= pan <= 1 (right) Mind that this works only with clients supporting the Web Audio API.Returns: A 32bit integer ID greater than 0 (or 0 on failure). The ID may be used to stop this sound by calling meSpeak.stop(<id>).// exaple for caching and playing back audio streamsvar audiostreams = [];meSpeak.speak('hello world', { 'rawdata': true }, function(success, id, stream) { // data is ArrayBuffer of 8-bit uint audiostreams.push(stream);});meSpeak.speak('hello again', { 'rawdata': 'array' }, function(success, id, stream) { // data is Array of 8-bit uint Numbers audiostreams.push(stream);});meSpeak.speak('hello again', { 'rawdata': 'base64' }, function(success, id, stream) { // data is a string containing the base64-encoded wav-file audiostreams.push(stream);});meSpeak.speak('hello yet again', { 'rawdata': 'data-url' }, function(success, id, stream) { // data is a audiostreams.push(stream);});meSpeak.play(audiostreams[0]); // using global volumemeSpeak.play(audiostreams[1], 0.75); // 75% of global volumemeSpeak.play(audiostreams[2], 0, null, 0, -1); // play if from the leftmeSpeak.play(audiostreams[3], 0, 0, 0, 0.25); // play it from a querter to the rightmeSpeak.stop( [<id-list>] );Stops the sound(s) specified by the id-list.If called without an argument, all sounds currently playing, processed, or queued are stopped.Any callback(s) associated to the sound(s) will return false as the success-flag.Arguments:id-list: Any number of IDs returned by a call to meSpeak.speak() or meSpeak.play().Returns:The number (integer) of sounds actually stopped.meSpeak.setFilter(<options>[,<options>]);New in meSpeak 2.0: Set filters for audio playback (post processing).Supported are any of the BiquadFilters and DynamicsCompressors.You may add any number of filters, which will be chained together before feeding into the gloabel gain node.Options:type: (String) Filter type, case-insenstitive BiquadFilters: 'lowpass', 'highpass', 'bandpass', 'lowshelf', 'highshelf', 'peaking', 'notch', 'allpass' DynamicsCompressor: 'dynamicscompressor' or 'compressor'For BiquadFilters: frequency (Number) Q (Number) gain (Number) detune (Number)For DynamicsCompressors: threshold (Number) knee (Number) ratio (Number) reduction (Number) attack (Number) release (Number)// Example:meSpeak.setFilter( { type: 'highpass', frequency: 85 }, { type: 'compressor', threshold: -10, knee: 40, ratio: 5, attack: 0, release: 0.25 }, { type: 'bandpass', frequency: 500, Q: 0.125, detune: 10 });myAnalyserNode = meSpeak.getAudioAnalyser();returns an Web Audio AnalyserNode for further processing (e.g., a wave display) of the signal played by meSpeak.js.The AnalyserNode mirrors the signal present in the first global audio processingstage (after individual volume/gain), but before filters.Compare the Audio Anaylser Demo.meSpeak.getRunMode();Determine, if the client is running a concurrent worker or a single-threaded instance.Returns either the string 'worker' or 'instance'meSpeak.restartWithInstance();For testing purposes only: Restart MeSpeak forcing it to use an instance instead of a worker.Returns: nothing / void.Note on export formats, ArrayBuffer (typed array, defaul) vs. simple array:
The ArrayBuffer (8-bit unsigned) provides a stream ready to be played by the Web Audio API (as a value for a BufferSourceNode), while the plain array (JavaScript Array object) may be best for export (e.g. sending the data to Flash via Falsh's ExternalInterface). The default raw format (ArrayBuffer) is the preferred format for caching streams to be played later by meSpeak by calling meSpeak.play(), since it provides the least overhead in processing.Recommended File Layout
In order to ensure the functionality of meSpeak.js, the following layout is strongly encouraged:mespeak/ mespeak.js # required mespeak-core.js # required voices/ # default location ca.json cs.json de.json ...Mind that you just require thos vocie definitions which you are actually using.meSpeak.speakMultipart() — concatenating multiple voices
Using meSpeak.speakMultipart() you may mix multiple parts into a single utterance.See the Multipart-Example for a demo.The general form of meSpeak.speakMultipart() is analogous to meSpeak.speak(), but with an array of objects (the parts to be spoken) as the first argument (rather than a single text):meSpeak.speakMultipart( <parts-array> [, <options-object> [, <callback-function> ]] );meSpeak.speakMultipart( [ { text: 'text-1', <other options> ] }, { text: 'text-2', <other options> ] }, ... { text: 'text-n', <other options> ] }, ], { option1: value1, option2: value2 .. }, callback);Only the the first argument is mandatory, any further arguments are optional.
The parts-array must contain a single element (of type object) at least.
For any other options refer to meSpeak.speak(). Any options supplied as the second argument will be used as defaults for the individual parts. (Same options provided with the individual parts will override these defaults.)
The method returns — like meSpeak.speak() — either an ID, or, if called with the 'rawdata' option (in the general options / second argument), a stream-buffer representing the generated wav-file.Note on iOS and Mobile Limitations
iOS (currently supported only using Safari) provides a single audio-slot, playing only one sound at a time.
Thus, any concurrent calls to meSpeak.speak() or meSpeak.play() will stop any other sound playing.
Further, iOS reserves volume control to the user exclusively. Any attempt to change the volume by a script will remain without effect.
Please note that you still need a user-interaction at the very beginning of the chain of events in order to have a sound played by iOS.Note on Options
The first set of options listed above corresponds directly to options of the espeak command. For details see the eSpeak command documentation.
The meSpeak.js-options and their espeak-counterparts are (mespeak.speak() accepts both sets, but prefers the long form):meSpeak.js eSpeak amplitude -a wordgap -g pitch -p speed -s voice -v variant -v<voice>+<variant> utf16 -b 4 (default: -b 1) linebreak -l capitals -k nostop -z ssml -m punct --punct[='<characters>'] Voices Currently Available
- ca (Catalan)
- cs (Czech)
- de (German)
- el (Greek)
- en/en (English)
- en/en-n (English, regional)
- en/en-rp (English, regional)
- en/en-sc (English, Scottish)
- en/en-us (English, US)
- en/en-wm (English, regional)
- eo (Esperanto)
- es (Spanish)
- es-la (Spanish, Latin America)
- fi (Finnish)
- fr (French)
- hu (Hungarian)
- it (Italian)
- kn (Kannada)
- la (Latin)
- lv (Latvian)
- nl (Dutch)
- pl (Polish)
- pt (Portuguese, Brazil)
- pt-pt (Portuguese, European)
- ro (Romanian)
- sk (Slovak)
- sv (Swedish)
- tr (Turkish)
- zh (Mandarin Chinese, Pinyin)*
- zh-yue (Cantonese Chinese, Provisional)**
JSON File Formats
1) Config-data: 'mespeak_config.json':
The config-file includes all data to configure the tone (e.g.: male or female) of the electronic voice.{ 'config': '<base64-encoded octet stream>', 'phontab': '<base64-encoded octet stream>', 'phonindex': '<base64-encoded octet stream>', 'phondata': '<base64-encoded octet stream>', 'intonations': '<base64-encoded octet stream>'}Finally the JSON object may include an optional voice-object (see below), that will be set up together with the config-data:{ ... 'voice': { <voice-data> }}2) Voice-data: 'voice.json':
A voice-file includes the ids of the voice and the dictionary used by this voice, and the binary data of theses two files.{ 'voice_id': '<voice-identifier>', 'dict_id': '<dict-identifier>', 'dict': '<base64-encoded octet stream>', 'voice': '<base64-encoded octet stream>'}Alternatively the value of 'voice' may be a text-string, if an additional property 'voice_encoding': 'text' is provided.
This shold allow for quick changes and testing:{ 'voice_id': '<voice-identifier>', 'dict_id': '<dict-identifier>', 'dict': '<base64-encoded octet stream>', 'voice': '<text-string>', 'voice_encoding': 'text'}Both config-data and voice-data may be loaded and switched on the fly to (re-)configure meSpeak.js.For a guide to customizing languages and voices, see meSpeak – Voices & Languages.Extendet Voice Format, Mbrola Voices
In order to support Mbrola voices and other voices requiring a more flexible layout and/or additional data, there is also an extended voice format:{ 'voice_id': '<voice-identifier>', 'voice': '<base64-encoded octet stream>' 'files': [ { 'path', '<rel-pathname>', 'data', '<base64-encoded octet stream>' }, { 'path', '<rel-pathname>', 'data', '<text-string>', 'encoding': 'text' }, ... ]}or (using a text-encoded voice-definition):{ 'voice_id': '<voice-identifier>', 'voice': '<text-string>', 'voice_encoding': 'text' 'files': [ { 'path', '<rel-pathname>', 'data', '<base64-encoded octet stream>' }, { 'path', '<rel-pathname>', 'data', '<text-string>', 'encoding': 'text' }, ... ]}Only a valid voice-definition is required and optionally an array 'files' which may be empty or contain any number of objects, containing a property 'path' (relative file-path from the espeak-data-directory) and a property 'data', containing the file (either as base64-encoded data or as plain text, if there is also an optional property 'encoding': 'text').In order to facilitate the use of Mbrola voices, for any 'voice_id' beginning with 'mb/mb-' only the part following the initial 'mb/' will be used as the internal identifyer for the meSpeak.speak() method. (So any given voice_id'mb/mb-en1' will be translated to a voice'mb-en1' automatically. This applies to the speak-command only.)Please don't ask for support on Mbrola voices (I don't have the faintest idea). Please refer to Mbrola section of the eSpeak documentation for a guide to setting up the required files locally. It should be possible to load these into meSpeak.js using the 'extended voice format', since you may put any additional payload into the files-array. Please mind that you will still require a text-to-phoneme translator as stated in the eSpeak documentation (this is out of the scope of meSpeak.js).Deferred Calls
In case that speak() is called before any voice data has been loaded, the call will be deferred and executed after set up.
See this page for an example. You may reset the queue manually by callingmeSpeak.resetQueue();Amplitude and Volume
There are now two separate parameters or options to control the volume of the spoken text: amplitude and volume.
While amplitude affects the generation of the sound stream by the TTS-algorithm, volume controls the playback volume of the browser. By the use of volume you can cache a generated stream and still provide an individual volume level at playback time. Please note that there is a global volume (controlled by setVolume()) and an individual volume level relative to the global one. Both default to 1 (max volume).Notes on Chinese Languages and Voices
Please note that the Chinese voices do only support Pinyin input (phonetic transcript like 'zhong1guo2' for 中 + 国, China) for 'zh' and simple one-to-one translation from single Simplified Chinese characters or Jyutping romanised text for 'zh-yue'.The eSpeak documentation provides the following notes:*) zh (Mandarin Chinese):
This speaks Pinyin text and Chinese characters. There is only a simple one-to-one translation of Chinese characters to a single Pinyin pronunciation. There is no attempt yet at recognising different pronunciations of Chinese characters in context, or of recognising sequences of characters as 'words'. The eSpeak installation includes a basic set of Chinese characters. More are available in an additional data file for Mandarin Chinese at: http://espeak.sourceforge.net/data/.**) zh-yue (Cantonese Chinese, Provisional):
Just a naive simple one-to-one translation from single Simplified Chinese characters to phonetic equivalents in Cantonese. There is limited attempt at disambiguation, grouping characters into words, or adjusting tones according to their surrounding syllables. This voice needs Chinese character to phonetic translation data, which is available as a separate download for Cantonese at: http://espeak.sourceforge.net/data/.
The voice can also read Jyutping romanised text.For a simple zh-to-Pinyin translation in JavaScript see: https://www.masswerk.at/mespeak/zh-pinyin-translator.zipFlash-Fallback for Wave Files
(m)eSpeak produces internally wav-files, which are then played. Internet Explorer 10 supports typed arrays (which are required for the binary logic), but does not provide native playback of wav-files. To provide compatibility for this browser, you could try the experimental meSpeak Flash Fallback.Source
Download (all code under GPL): mespeak.zip
(v.2.0.7, last update: 2020-04-23)The last version of the old API, v.1.9.7.1 may be downloaded here: mespeak_1-9-7-1.zipVersion History
- v.2.0.7
- Added audio unlocking for Safari desktop browsers.
- v.2.0.6
- Added a call counter to restart the core internally after the 80th count to work around a memory leak with some browsers. (It's unclear, if this is caused by the JS runtime or by the eSpeak code or the translation by Emscripten. At least, it's a browser specific issue.)
- v.2.0.5
- Added the original eSpeak license statement.
- v.2.0.4
- Added a simple mobile unlocker (plays a short, inaudible sound on the first touchstart event).
- v.2.0.3
- Changed implementation of meSpeak.getAudioAnalyser().
- v.2.0.2
- Oops, we can't play sounds on a postMessage event (no user interaction) on iOS and probably othe mobile systems as well. So workers are disabled on mobile devices.
- v.2.0.1
- Added meSpeak.getAudioAnalyser().
- v.2.0
- Major update. Now running a worker for the care application (or a separate instance for compatibility with older clients). Reduced file size, some new methods, minor API changes. (See note on top.)
- v.1.9.7
- Fix for Web Audio API changes in Apple Safari 9.x (Mac OS X and iOS, compare v.1.9.2).
- v.1.9.6
- Minor internal changes.
- v.1.9.5
- Added meSpeak.speakMultipart().
Also, meSpeak.speak() and meSpeak.speakMultipart() won't fail on a missing voice any more: As soon as there is a default-voice loaded and set, the default-voice will be used instead. - v.1.9.4.1
- Fixed a bug in the error handling on missing voices.
- v.1.9.4
- Finally found a work-around for the Emscripten FS breaking on the 80th call to run() (internally called by meSpeak.speak()): We now reboot gracefully, preserving any loaded files; no external effects or differences in behavior are caused by this. In order to accomplish this, the eSpeak-core is now run as an instance of a constructor.
- v.1.9.3
- Added support for the Unicode Basic Latin and Latin-1 Supplement character range (U+0000 . U+00FF).
(Emscripten originally supports only the C-locale, 7-bit ASCII.) - v.1.9.2
- Fix for Chrome 32: Worked around a behavioral change (bug?) in Chrome 32.
It might be worth noting that it is no more possible to play back sound with the Web Audio API by the same code with Webkit iOS and Chrome while using the decodeAudioData-method. (Welcome back to user-agent sniffing. Really Google?)
Since this might be of general interest, here is a short tutorial:/* Cross-Browser Web Audio API Playback With Chrome And Callbacks */// alias the Web Audio API AudioContext-objectvar aliasedAudioContext = window.AudioContext || window.webkitAudioContext;// ugly user-agent-string sniffingvar isChrome = ((typeof navigator ! 'undefined') && navigator.userAgent && navigator.userAgent.indexOf('Chrome') ! -1);var chromeVersion = (isChrome)? parseInt( navigator.userAgent.replace(/^.*?bChrome/([0-9]+).*$/, '$1'), 10 ) : 0;function playSound(streamBuffer, callback) { // set up a BufferSource-node var audioContext = new aliasedAudioContext(); var source = audioContext.createBufferSource(); source.connect(audioContext.destination); // since the ended-event isn't generally implemented, // we need to use the decodeAudioData()-method in order // to extract the duration to be used as a timeout-delay audioContext.decodeAudioData(streamBuffer, function(audioData) { // detect any implementation of the ended-event // Chrome added support for the ended-event lately, // but it's unreliable (doesn't fire every time) // so let's exclude it. if (!isChrome && source.onended ! undefined) { // we could also use 'source.addEventListener('ended', callback, false)' here source.onended = callback; } else { var duration = audioData.duration; // convert to msecs // use a default of 1 sec, if we lack a valid duration var delay = (duration)? Math.ceil(duration * 1000) : 1000; setTimeout(callback, delay); } // finally assign the buffer source.buffer = audioData; // start playback for Chrome >= 32 // please note that this would be without effect on iOS, since we're // inside an async callback and iOS requires direct user interaction if (chromeVersion >= 32) source.start(0); }, function(error) { /* decoding-error-callback */ }); // normal start of playback, this would be essentially autoplay // but is without any effect in Chrome 32 // let's exclude Chrome 32 and higher to avoid any double calls anyway if (!isChrome || chromeVersion < 32) { if (source.start) { source.start(0); } else { source.noteOn(0); } }} - v.1.9.1
- Added support for IDs to meSpeak.setVolume() and meSpeak.getVolume() in order to optionally address relative playback volumes of individual sounds.
(If IDs are supplied as optional arguments, the volume will be the relative volume of the sound(s) with corresponding ID(s), else the global playback volume.) - v.1.9
- Added meSpeak.stop(). For this a new return value is introduced:
meSpeak.speak() and meSpeak.play() return now a 32bit numeric ID (quite like setTimeout()).
IDs may be provided to meSpeak.stop() as argument(s) in order to stop specific sounds.
If meSpeak.stop() is called without any arguments, all sounds currently processed, playing, or queued will be stopped.
meSpeak.speak() returns still an audio-stream in the requested format, if called with the 'rawdata'-option.
In case of failing, 0 is returned as an ID (or null with a 'rawdata'-request), while a successful call will always return an ID greater than 0. - v.1.8.7
- Returned to improved handling of durations reported by Web Audio streams, used to handle callbacks. This is as in 1.8.5.
- v.1.8.6
- Fixed a bug (itroduced in a previous version) preventing tablet-based webkit-browsers from actually playing. (So you can't start a sound from inside the decodeAudioData()-callback?)
- v.1.8.5
- Disabled the Web Audio source-node's onended event-handler for Chrome to work around a bug in Chrome, where the event is not firing reliably. (We are falling back to a timeout on the stream's duration like before Chrome implemented the onended event.)
- v.1.8.4
- speak() now also accepts the eSpeak flags as option keys (e.g. 'k' for 'capitals' or 'v' for 'voice', cf. the note on options).
Added documentation for the 'punct'-option. - v.1.8.3
- speak() now cleans up the filesystem from the internal wav-file after use and returns a unique array of the resulting sound-data (rather than just a pointer to the array produced by emscriptens filesystem).
- v.1.8.2
- Added a a bit of delay before finally unlinking any Web Audio API resources (working around a Chrome duration issue). meSpeak.play() now reports in the log the object type of any unsuitable input.
- v.1.8.1
- Tweeked the handling of Mbrola voices.
- v.1.8
- Added support for extended voice-formats (like Mbrola voices).
- v.1.7
- Added support for various minor eSpeak-options (now the full set of usable options is supported).
Also, we indicate explicitely that the text to be spoken is UTF-8 encoded (if not specified otherwise) rather than reliying on defaults. - v.1.6
- Added support for voice-variants.
- v.1.5.1
- Fixed deferred call option to include and execute any callbacks.
- v.1.5
- Added an optional callback to meSpeak.speak() and meSpeak.play().
Added some clean-up code to prevent any memory leaks with some implementations of the Web Audio API.
Removed any references to 'window' in favor for 'self'. - v.1.4.4
- Cleaned up a bit of the Emscripten-generated code, changed wording in this page.
- v.1.4.3
- Better handling for base64-imports when using the HTMLAudioElement for playback with meSpeak.play(). (Less overhead.)
- v.1.4.2
- Added base64 or set to 'true' or '1' is provided.
(This additional parameter should inhibit any repeated attempts to play in case the script would fail and the demo-form would be sent via GET-parameters.) - v.1.03
- Added an instant link for auto-speak to this demo-page.
- v.1.02
- Added Chinese voice-data (zh, zh-yue) by popular request.
- v.1.01
- Added an onload-callback to the assignment of the generated audio-data-URL. This should add compatibility to newer versions of WebKit and Chrome.
- v.1.0
- Initial upload.
About speak.js
speak.js is 100% clientside JavaScript. 'speak.js' is a port of eSpeak, an open source speech synthesizer, which was compiled from C++ to JavaScript using Emscripten.
The project page and source code for this demo can be found here.
Note: There had been initially plans to merge this project with speak.js, but they somehow became stuck.Browser requirements:- Typed arrays. The eSpeak code is not portable to the extent that would be necessary to avoid using typed arrays. (It should however be possible to rewrite small bits of eSpeak to fix that.) Typed arrays are present in Firefox, Chrome, Webkit, and Safari, but not IE or Opera.
- Update: Opposed to the state of the original documentation, newer versions of Opera and IE both provide support for typed arrays.
- A standard configuration is now included. Meaning, there is no need to call “