R.R. Riesz 1937Wheatstone's version of Von Kempelen's Machine

The Art of Voice Synthesis

symposium, expert meeting, workshop & concert

, Amsterdam

Voice Synthesis

Remko Scha and his artificial voices

alt : Remko Scha, "Virtual Voices", Mediamatic Magazine Vol. 7 no. 1 (1992), pp. 27-42.

Huge Harry and the Institute of Artificial Art (1999), a film by Luuk Bouwman

Institute of Artificial Art Amsterdam

Research by Fabian Brackhane

"Kann was natürlicher, als Vox humana, klingen?" - Ein Beitrag zur Geschichte der mechanischen Sprachsynthese. (doctoral dissertation, Phonus 18, 2015)

In memoriam Wolfgang von Kempelen (Phonus 16, 2011)

As background information on the conference and the concert The Art of Voice Synthesis, Orgelpark published an article in its magazine Timbres 19, in Dutch:

alt : Timbres_nr19_Stemimitatie.pdf

Nicolas d'Alessandro's HandSketch
Nicolas d'Alessandro's HandSketch
Nicolas d'Alessandro's HandSketch
Nicolas d'Alessandro's HandSketch

The Art of Voice Synthesis: Models and Replications

Creating an artificial voice has been a preoccupation already for several centuries. The first mechanical models imitated parts of the human body that were most clearly involved in vocal production. Later models were based on theoretical principles for the speaking (and sometimes: singing) voice. Electroacoustic synthetic or re-synthesized (quasi-)vocal sounds have been used in contemporary music and art already for decades. Nowadays, artificial voices are abundant: such as in voice response systems (telephone), in navigation systems, as aid for the vocally or visually disabled, and as commercial singing synthesis computer programme (Yamaha's Vocaloid). The current success of artificial voices comes with a change of focus from synthesis by rules, based on a model of the voice, to the use of recorded voices that are analysed, cut into tiny fragments, manipulated and put together to form new utterances, facilitated by the enormous increase of computer data capacity and computing power.

We wish to look at the full range of techniques that are used and that have been used: from mechanical replication of the synthesis process (von Kempelen, 1791), through theory-based modelling (electro-mechanical: Helmholtz, 1863; or digital: Klatt, 1980s; synthesis by rule; physical modelling), to methods that are based on audio recordings (synthesis by analysis, analysis/re-synthesis; and concatenation, such as Vocaloid).

Our point of departure is contemporary music. Therefore, a system's capacity to produce acceptable-sounding speech is not our ultimate evaluation criterion. Also, the replication of classical opera technique, though definitely interesting, is not enough. Our reference frame includes the use of extended vocal techniques in the 20th century avant-garde (such as developed by Cathy Berberian, Trevor Wishart and others), as well as the singing styles of various popular and ethnic traditions. A new research question thus emerges: the artificial generation of the complete repertoire of human vocal possibilities.

What are the limitations of the existing voice synthesis models and techniques? And what do these limitations reveal of the complexity and diversity of real, embodied human voices? Is it possible to synthesize "the grain of the voice" (R. Barthes)?

Voice synthesis is, to varying degrees, based on a model of the voice informed by phonetics and voice acoustics. Time-to-frequency transformation (Fourier analysis), sound spectrum analysis, formants, and the source-resonance principle (larynx - vocal tract) are among its basic concepts. Musical instruments functioned as models, objects and inspiration for the science of acoustics; and with respect to the voice, the organ seems to be a privileged metaphor. What attitude(s) towards voice and sound do the existing voice synthesis models and the underlying concepts imply? How is this related to conceptions of sound, music, voice, body, gender and nature?

What alternative models have been conceived of the voice and its artificial synthesis? What alternative models could we think of? If temporal acuity is central in auditory processing (Oppenheim & Magnasco 2013), and the ear does not (only) perform spectral analysis, what are the consequences for the prevalent models of the voice in which vocal spectra (with formants) are of primary importance?

How are artificial voices used in music and other arts? Does this offer a different perspective on the models and methods of voice synthesis? And does the artistic use of voice synthesis offer different perspectives on the voice in general or on specific voices in particular?

In the realm of these questions we are organizing a conference where technical and historical experts meet to discuss the potentials, limitations, implications and contexts of the different voice synthesis techniques.

In memoriam Remko Scha

The Art of Voice Synthesis is an initiative of Remko Scha, artist and professor emeritus in computational linguistics at the University of Amsterdam.

Remko Scha passed away on 9 November 2015.

We organize this conference in sad and thankful remembrance of his enthusiasm, generosity, keen interest and inspiring ideas.

Obituary Remko Scha (1945-2015)