The following is an extract from the:

Guidelines for ToBI Labelling

(version 3, March 1997)

Mary E. Beckman & Gayle Ayers Elam
Ohio State University

0. Preface

0.1. What are the "Guidelines for ToBI Labelling"?

ToBI (for Tones and Break Indices) is a system for transcribing the intonation patterns and other aspects of the prosody of English utterances. It was devised by a group of speech scientists from various different disciplines (electrical engineering, psychology, linguistics, etc.) who wanted a common standard for transcribing an agreed-upon set of prosodic elements, in order to be able to share prosodically transcribed databases across research sites in the pursuit of diverse research purposes and varied technological goals. Silverman et al. (1992) and Pitrelli et al. (1994) describe the motivation for and development of the ToBI system. If you ask for this handbook in hard copy, those papers will be appended as Appendix B. Appendix A (which is included both in the hard copy and in the ASCII file version of this labelling guide) is "The ToBI Annotation Conventions", the definitive summary statement of the symbols and marks used in ToBI transcriptions, and of the conventions that we have agreed upon for their use. The rest of this labelling guide is a more detailed description of the system, with reference to accompanying utterances of two types: example utterances to illustrate points made in the text and exercise utterances to give labellers practice on the points made in the text. These utterances are set off in the text of the labelling guide using the following typographic conventions.

EXAMPLE <<basename>>: orthographic transcription

tonal transcription and/or break index values

EXERCISE <<basename>>: orthographic transcription

Each example utterance is also referred to in the text by its basename within pairs of angle brackets -- e.g., the first example utterance is <<jam1>>. We have chosen the examples and arranged the exercises with the aim of leading new users through the system in a self-taught training course, trying to choose utterances in each of the six practice sets that show only phenomena that have been introduced up to that point.

The utterances that accompany this labelling guide can be obtained in two formats: as digitized computer files with electronic record of the f0 contour from the Ohio State University web and ftp distribution site (see section 0.2) or as an audio tape with paper record of the f0 contour (see section 0.3).

0.1.1. Notice of copyright and restrictions on use

The "Guidelines for ToBI Labelling" document and associated material are copyrighted. The text cannot be copied or distributed in any format unless this paragraph is included. The utterances accompanying the guide are available to any interested user, but only for non-commercial use. The National Science Foundation and the Ohio State University make no warranty and accept no liability associated with the use of these materials. These materials may be obtained only as described in Sections 0.2 and 0.3, and are not to be redistributed by other user sites. Users may not redistribute these materials from their own sites, but should instead tell interested people how to obtain their own copy from the distribution site.

0.1.2. Acknowledgements

The "Guidelines for ToBI Labelling" and the accompanying utterances were developed in the Ohio State University Linguistics Laboratory with partial support from the National Science Foundation, and the Ohio State University continues to support the labelling guide by providing a distribution site for the electronic records (described in Section 0.2). Colin Wightman generously provided the distribution site for the electronic records for version 2.0 of the labelling guide in his lab at the New Mexico Institute of Mining and Technology. Jennifer Venditti provided LaTeXing and various other editing expertise for this earlier version, which we have relied on in producing this new one. Kim Silverman and John Pitrelli developed the original transcriber script, on which we based the primary shell scripts for viewing the examples and doing the exercises. David Talkin helped in innumerable ways, such as by developing the scripts for the cardinal examples. Harald Singer developed an alternative electronic format for version 2.0, and Stefanie Jannedy set up the web page for it and for the ftp site.

0.2. Getting and using the digitized utterances and f0 tracks

If you have waves(tm) (an Entropic Research Laboratory product) or a similar computer display system, obtain the speech files, electronic record of the f0 contour, and label files by ftp from the Ohio State University distribution site. Section 0.2.1 describes that version.

There is also an Emu version that Steve Cassidy helped us to create. You can get that version from the Emu home page at Macquarie University. If you are reading this page over the WWW, click here to go to the "Emu and ToBI" page.

And the labels have been converted to praat TextGrid files. If you are reading this page over the WWW, click here to go to the English ToBI home and download these files.

0.2.1. Getting the digitized utterances and f0 tracks

There are two options for obtaining the ToBI materials depending upon how much disk space users have available. For those with sufficient disk space there is a single large tarfile for convenience. This option requires having about 40 MB available during the installation process; the full materials occupy about 20 MB once the installation is completed and the tarfile is removed. If you do not have enough space to have both the complete tarfile and all the installed files at the same time, use the second option. There are three smaller tarfiles which together contain all the materials contained in the single large tarfile. That is, they contain the speech files, f0 records, and label files divided into three parts by order of occurrence in the Guidelines. In addition to the single large or three smaller tarfiles, you will need to get the "essentials" tarfile, which is about 2.5 MB and contains an ASCII version of "The Guidelines for ToBI Labelling", and the scripts and tools for displaying the f0 tracks and labels.

If you are reading this page over the WWW, click here to access the tarfiles. Download the README-file first for descriptions of the tar files and of the directory structure that they will set up on your home system.

...

0.2.2. A less interactive electronic version

Version 2.0 of the labelling guide has been converted to a series of html files that can be fetched to your computer for perusal and playback. The F0 contours are embedded as gif images and the audio files are embedded in au format. The conversion to this format was done by Harald Singer, and it is available on the Ohio State University Linguistics Laboratory web site. Click here.

If you have a PC running Windows, you may find it better to download the audio files in wav format to play. Get these from our ftp site. The compressed tar file is wav_files_tobi_v3.tar.gz. This file is huge. If you want to retrieve just a few files by name, click here instead.

...

0.4. Future editions and a disclaimer

If you have comments on this Labelling Guide -- particularly, if you have suggestions for improvements or better example utterances you would like to give to us -- we would be very grateful if you would direct the commments to us at:

e-mail:tobi@ling.ohio-state.edu

other mail:

ToBI Labelling Guide, c/o Mary Beckman
Ohio State University, Linguistics Dept.
222 Oxley Hal, 1712 Neil Ave.
Columbus, OH 43210-1298 USA

This e-mail address is also the place to send us your e-mail address if you want to be added to our list of "subscribers" to be notified of any future editions of the Labelling Guide.

The ToBI labelling system was originally developed to cover the three most widely used varieties of spoken English -- namely, general American, standard Australian, and southern British English. We do not claim to cover other varieties. Indeed, we have already determined that ToBI proper does not adequately cover many other British varieties such as the Glasgow dialect, and modified variants need to be developed by users who want to use it in transcribing utterances in these other dialects. By the same token, we must stress that ToBI was not intended to cover any language other than English, although we endorse the adoption of the basic principles in developing transcriptions systems for other languages, particularly languages that are typologically similar to English. More general comments about using the ToBI system for other dialects of English or about adapting ToBI labelling principles to develop comparable systems for the transcription of other languages may also be addressed to the tobi e-mail address listed above for forwarding to appropriate interested members of the larger ToBI group.