«143 Bioacoustics The International Journal of Animal Sound and its Recording, 2005, Vol. 15, pp. 143–161 0952-4622/05 $10 © 2005 AB Academic ...»
The International Journal of Animal Sound and its Recording, 2005, Vol. 15, pp. 143–161
© 2005 AB Academic Publishers
CLASSIFICATION OF AFRICAN ELEPHANT
LOXODONTA AFRICANA RUMBLES USING
ACOUSTIC PARAMETERS AND CLUSTER
JASON D. WOOD1*, BRENDA MCCOWAN2, WILLIAM R. LANGBAUER JR.3,
JOZUA J.VILJOEN4 AND LYNETTE A. HART2
1Department of Geophysics, Stanford University, USA.
2Department of Population Health & Reproduction, School of Veterinary Medicine, University of California, Davis, USA 3The Pittsburgh Zoo, Pittsburgh, Pennsylvania, USA.
4Department of Nature Conservation, Tshwane University of Technology, South Africa.
ABSTRACT It has been suggested that African savanna elephants Loxodonta africana produce 31 different call types (Langbauer 2000). Various researchers have described these calls by associating them with specific behavioural contexts. More recently Leong et al.
(2003) have attempted to classify elephant call types based on their physical properties. They classified 8 acoustically distinct call types from a population of captive elephants. This study focuses on one of these call types, the rumble, in a wild population of elephants in Kruger National Park, South Africa. A single family group of elephants was followed to record group behaviours and vocalizations from January through August 2001. By measuring the physical properties of 663 rumbles and subjecting these to cluster analysis, we present evidence that shows that rumbles can be categorized by their physical properties and that the resulting rumble types are associated with specific group behaviours. We characterize three types of rumbles that differ significantly by ten acoustic parameters. Two rumble types were associated with the elephant group feeding and resting, while the third was associated with socializing and agitation.
Keywords: African elephant, Loxodonta africana, acoustic communication, call categorization, cluster analysis.
INTRODUCTIONAn essential component to understanding the acoustic communication of any species is the ability to distinguish between different call types.
For a signal to be interpreted correctly by a conspecific, the receiver must be able to make this distinction as well. This distinction is sometimes made easier by sending a complementary signal using another sensory modality. For instance, an acoustic signal might be accompanied by a visual signal that would decrease ambiguity about the meaning of the acoustic signal. The behavioural context in which the signal is given can also give clues as to the meaning of the signal.
However, in situations where signals are received in only one modality (such as when one modality operates at greater distances than other modalities), the coding of the signal must be structured such that it can be interpreted (i.e., correctly categorized) by the receiver of that signal.
In these single modality signals there must be enough physical structure in the signal to decrease ambiguity in the interpretation of the signal by the receiver. In addition, that coding structure must be maintained over the distance between the sender and the intended receiver.
African elephants produce low frequency vocalizations that they respond to at distances of around 2 kilometres (Langbauer et al. 1991, McComb et al. 2003). It has been suggested that these rumbles are used for communication between family herds over large distances, as well as for communication within family herds (Payne et al. 1986, Poole et al. 1988). Even during communication within family herds, it is likely that complementary signals produced in modalities other than the acoustic modality would not reach all group members, because family groups will spread out considerable distances (sometimes to distances of 400 meters, pers. obs.). Other family group members would however be able to place these acoustic signals within broad behavioural contexts (e.g. group feeding). Other than the acoustic modality, the communication modalities that elephants use (for review see Langbauer 2000), are not reliable over the distances which elephants are reported to communicate. Therefore there should be sufficient coding in their vocalizations for receivers to interpret the meaning of the signal, if indeed there are distinct categories of calls with specific meanings in elephant communication.
Early attempts at categorizing different elephant call types have done so by associating calls to specific behaviours, and giving a brief description of the physical properties of the calls (Berg 1983, Poole et al. 1988). In this way, 31 call types were described (Langbauer 2000).
Most recently Leong et al. (2003), in an attempt to standardize the classification of African elephant vocalizations, used the measures of bandwidth, sound quality (i.e. whether the sound is a tonal harmonic, pulsatile, or noisy), fundamental frequency, presence of infrasonic components, and duration to classify calls. Based on these physical properties they defined 8 mutually exclusive call types, 3 of which were rumble variants (Noisy Rumble, Loud Rumble, and Rumble). These 3 rumble types were differentiated by bandwidth (i.e., the number of higher harmonics present in the call). A cross-correlation analysis was then conducted on the fundamental frequency contour of the Rumble call type, as this was the most common call. This indicated the presence of 5 rumble types, but when subjected to multi-dimensional scaling, there was little clustering between the call types. This suggests either that rumble types grade into each other, or that the fundamental frequency contour is not the physical property of a rumble that is used for coding the meaning of these signals. The aim of this study was to categorize elephant rumbles using not just frequency contours but also other physical parameters of this call type.
Forty-two hours of recordings were made between January and August 2001 in the southern part of Kruger National Park (KNP), South Africa using a TASCAM DA-P1 DAT recorder (sampling rate 48 kHz) and a Neumann KM 131 microphone. Recording sessions were conducted on foot or occasionally in a vehicle, and consisted mostly of a focal family unit, although there were many times when other family groups or adult males were in the vicinity as well. Audio notes were made of group behaviour any time this changed (see Results for list of behaviours). Recordings were transferred in the lab from the DAT tape into windows PCM wave files by using the digital out line on the DAT recorder and the digital in line on a VX222 Digigram sound card. Cool Edit Pro V1.2 was used to create these wave files, to down sample the files to 16 kHz (for faster generation of spectrograms since we were only interested in low frequency sound) and for subsequent cueing and filtering. Each rumble was located by listening to the recording and by observing its spectrogram. A start and end cue were marked in the wave file for each rumble, which allowed for the rumbles to be extracted as separate files. Each rumble was then filtered using a Butterworth band pass filter so that only the second harmonic remained. 975 rumbles were identified, 663 of which could be adequately filtered and used in the analysis. The other 312 rumbles were not included in the data set because accurate measurements could not be made due to overlap with other sounds or rumbles.
The second harmonic was extracted because it was consistently the clearest part of the signal in the recordings. In the 10 cases where the second harmonic could not be filtered, either the fundamental or another harmonic was filtered, and any subsequent measures converted to the equivalent of the second harmonic. The contours of the second harmonic and acoustic parameters were extracted using macros developed by McCowan (1995) for Cool Edit pro. The analysis then followed the steps developed by McCowan (1995) and McCowan & Reis (2001). Frequency measures across 60 evenly-spaced points were taken to characterize the frequency contour as well as 19 other parameters (see Table 1). In a number of rumbles, the first 2 frequency measurements returned erroneously high frequencies, and so it was TABLE 1
Coefficient of Frequency Calculated variable that represents the amount and magnitude of frequency modulation across a Modulation (COFM) (McCowan rumble, computed by summing the absolute values of the difference between sequential frequencies and Reiss 1995) divided by 10000.
Frequency Variability Index Calculated variable that represents the magnitude of frequency modulation across a rumble, (CV) (Mitani and Brandt 1994) computed by dividing the variance in frequency by the square of the average frequency of a rumble and then multiplying the value by 10.
Inflection Factor (IF) Percentage of points showing a reversal in slope
Minimum Frequency (MIN) Lowest frequency attained by rumble, measured in Hz Peak Frequency (MAX) Highest frequency attained by rumble, measured in Hz Mean Frequency (MEAN) Calculated as average frequency across rumble Peak Amplitude Frequency (PAF) Frequency at maximum amplitude Frequency Range (FR) Calculated as peak frequency minus minimum frequency Peak Frequency/Mean Frequency Calculated as peak frequency divided by mean frequency (MAX/MEAN) Mean Frequency/Minimum Calculated as mean frequency divided by minimum frequency Frequency (MEAN/MIN) Peak Amplitude Location (PAL) Location of maximum amplitude, given as percentage of duration Minimum Frequency Location Location of minimum frequency, given as percentage of duration (MINL) Peak Frequency Location (MAXL) Location of peak frequency, given as percentage of duration
Start Slope (SSL) Calculated as (Frequency 20-Frequency 1)/(Time 20-Time 1) Middle Slope (MSL) Calculated as (Frequency 40-Frequency 20)/(Time 40-Time 20) Final Slope (FSL) Calculated as (Frequency 60-Frequency 40)/(Time 60-Time 40) decided not to include the first 2 frequency measures in the analysis.
The frequency contour of each rumble was then correlated to every other rumble to obtain a measure of similarity in shape. It is important to point out that this measure is a measure of similarity in shape, not in actual overlap of frequency range. That is to say, if two rumbles have similar frequency modulation (shape), but one starts at 20 Hz and the other at 30 Hz, they will still be highly correlated when using this technique. In essence it is a measure of relative change in frequency, not absolute frequency (McCowan 1995).
In order to cluster the rumbles we then subjected the correlation coefficients (from the frequency contours) to a principal components analysis (PCA) to reduce the number of factors. The resulting factor scores with an eigenvalue greater than 1 were then subjected to cluster analysis. As a separate analysis we also subjected the 19 acoustic parameters to a PCA and then a cluster analysis. In this way we could have a better idea of whether the shape of the call or some other parameter (e.g. duration, max frequency) led to better clustering. The clustering technique used was the MCLUST extension to S-PLUS statistical software (Fraley & Raftery 2002). The advantage of using MCLUST is that one can cluster the data using ten different models and determine which model and number of clusters is most appropriate. As an additional validation of the clustering we randomly selected 75% of the data set and subjected it to MCLUST again. This was repeated 3 times, each time with a new random selection. In addition a linear mixed effects analysis was conducted on each of the 19 acoustic parameters to test which parameters led to the best clustering. A multinominal logistic regression was used to test for associations between the resulting rumble types (clusters) and group behaviour. And finally, a discriminant analysis was conducted to test how well behaviour could be predicted by acoustic parameters. SAS (version 8.0, SAS Institute Inc.), S-Plus (version 6, Insightful Corp.), and Stata (version 7.0 STATA Corp.) were used for the various statistical tests conducted.
Using the default settings for the EMclust command in MCLUST we tested from 1 to 20 clusters and all 10 models. The Bayesian information criterion (BIC) was then used to determine which was the best model and number of clusters. Using the rumble contour data the best model was VEI with 4 clusters (see Figure 1). VEI is a model with a diagonal distribution, variable volume, equal shape, and an orientation that follows the coordinate axes. Fraley and Raftery (2002) label models in MCLUST using the terms Equal, Variable and Identity.
The first letter in the model name represents the cluster volume, the Figure 1. Bayesian information criterion values for 1 to 20 clusters for contour data. The 3 best models are included in this figure. VEI is a model with a diagonal distribution, variable volume, equal shape, and an orientation that follows the coordinate axes. VVI is a model with diagonal distribution, variable volume, variable shape, and an orientation that follows the coordinate axes.
VEV is a model with an ellipsoidal distribution, variable volume, equal shape, and variable orientation.
second shape, and the third orientation. The rumble contour data had a mean clustering uncertainty of 0.12 and a median clustering uncertainty of 0.06. Figure 2 shows the distribution of the 4-cluster classification in two-dimensional space. For the acoustic parameter data the best model was VEV with 3 clusters (see Figure 3). VEV is a model with an ellipsoidal distribution, variable volume, equal shape, and variable orientation. The acoustic parameter data had a mean clustering uncertainty of 0.10 and a median clustering uncertainty of
0.04. Figure 4 shows the distribution of the 3-cluster classification in two-dimensional space.