I use Lua to generate markup for the "say" command in MacOS, which can be used to create an AIFF file that is then compressed with ffmpeg. I could use a shell script to mix in additional audio tracks or make multiple audio tracks and then mix those together somehow. I like this approach because I can iterate on the script, sometimes just changing one word or a few letters and I don't have to do any repetitive work at all to generate the audio file.
The markup / template system I used in Lua can do some neat tricks, but the TTS controls are currently very simple. I recently found a web site that allows you to do TTS within your web browser and the voice markup commands are quite similar: 
http://webhypnotist.pwOne advantage of generating the whole script from a markup & template system in a real programming language is that I can change the underlying code and generate TTS markup for some other engine... For instance Google Wavenet voices sound almost perfect and I could convert my file to use that engine, if I wanted to. Despite sounding very real, there was an edge in the voices that wasn't as soothing as the slightly more unnatural voices in Macintalk.
Other advantages of the system are that the scripts are modular and configurable. I already have four versions of the script I uploaded here recently (two of them private custom versions just for myself). So, if I work on a new script and improve an induction or deepener that I used on another script, that other script will also improve with no extra effort.
Please comment and review the file I uploaded - I'm interested in feedback and further development. Sometimes I worry that I'm relatively alone in the world with my transformation kink, but when I found this site a few days ago, I immediately felt that this could be the right place for the file I made. (I hope you agree.)