查看原文
其他

刊讯|SSCI 期刊《语音学》2021年第88卷

四万学者关注了→ 语言学心得 2022-06-09
 Journal of Phonetics

Volume 88,  2021

     Journal of Phonetics(SSCI一区,2020 IF:2.670)2021年第88卷共发文16篇。研究论文涉及声门停止执行、高元音和滑翔音、声学单元、词汇表征、自发模仿、跨语素边界的搭配、非母语语音对比、异音词、希腊语重音等多方面内容。

目录


Effects of word position and flanking vowel on the implementation of glottal stop: Evidence from Hawaiian by Davidson Lisa

Temporal differences between high vowels and glides are more robust than spatial differences by Burgdorf Dan Cameron;Tilsen Sam

It’s alignment all the way down, but not all the way up: Speakers align on some features but not others within a dialogue by Ostrand Rachel;Chodroff Eleanor

Acoustic unit discovery using transient and steady-state regions in speech and its applications by Pandia Karthik;Murthy Hema A.

Lexical representations can rapidly be updated in the early stages of second-language word learning by Llompart Miquel;Reinisch Eva

Sociophonetic variation in English /l/ in the child-directed speech of English-Malay bilinguals by Sim Jasper Hong

A non-contrastive cue in spontaneous imitation: Comparing mono- and bilingual imitators by Kwon Harim

Phonetic convergence to non-native speech: Acoustic and perceptual evidence by Wagner Mónica A.;Broersma Mirjam;McQueen James M.;Dhaene Sara;Lemhöfer Kristin

Leveraging the temporal dynamics of anticipatory vowel-to-vowel coarticulation in linguistic prediction: A statistical modeling approach by Flego Stefon;Forrest Jon

“Hama”? Reduced pronunciations in non-native natural speech obstruct high-school students’ comprehension at lower processing levels by Wanrooij Karin;Raijmakers Maartje E.J.

Systematic co-variation of monophthongs across speakers of New Zealand English by Brand James;Hay Jen;Clark Lynn;Watson Kevin;Sóskuthy Márton

The role of L2 experience in L1 and L2 perception and production of voiceless stops by English learners of Spanish by Gorba Celia;Cebrian Juli

Coarticulation across morpheme boundaries: An ultrasound study of past-tense inflection in Scottish English by Mousikou Petroula;Strycharczuk Patrycja;Turk Alice;Scobbie James M.

Contextually-relevant enhancement of non-native phonetic contrasts by Kato Misaki;Baese-Berk Melissa M.

Simultaneous bilingualism and speech style as predictors of variation in allophone production: Evidence from Finland-Swedish by Strandberg Janine A.E.;Gooskens Charlotte;Schüppert Anja

The supralaryngeal articulation of stress and accent in Greek by Katsika Argyro;Tsai Karen

摘要

Effects of word position and flanking vowel on the implementation of glottal stop: Evidence from Hawaiian

Davidson Lisa

Abstract  Much of the literature on the phonetic realization of phonemic or allophonic glottal stop has shown that it is often not produced with full glottal closure. Some studies of languages like English or German suggest that full glottal closure might be more likely in stressed syllables or positions of prosodic prominence (Garellek, 2014; Kohler, 1994), but the conditioning factors in the realization of contrastive glottal stop are less well understood. This study focuses on Hawaiian, which has phonemic glottal stop that is contrastive in both word-initial (/ʔaka/ ‘laugh’ vs. /aka/ ‘shadow’) and word-medial position (/pua/ ‘flower’ vs. /puʔa/ ‘to excrete’) (Parker Jones, 2018). Glottal stop realization is examined with respect to three factors: word position, different vs. identical flanking vowel (/puʔa/ ‘to excrete’ vs. /puʔu/ ‘hill’), and duration of the target /V(#)ʔV/ sequence. Recordings from the Ka Leo Hawaiʻi Hawaiian language radio program that aired from 1972 to 1988 were examined. Results show that the majority of phonemic glottal stops are produced as a period of creaky voice, most often in a modal voice-creaky voice-modal voice configuration, but also as modal-creaky or creaky-modal. Full glottal stops were more likely in word-initial position, and identical flanking vowels led to longer periods of creaky voice. Shorter target intervals led to longer proportions of creaky voice. These findings for phonemic glottal stop are consistent with research on the timing of contrastive voice quality in vowels, which has shown that modal-nonmodal-modal patterns are preferred to ensure that vowel quality, voice quality, and tone (for languages with all three) are recoverable (Silverman, 1995/1997). The effects of word position and flanking vowel are also related to recoverability and segmentation. The potential articulatory configurations common to glottal stops and creaky voice which may explain why they are on a continuum are also discussed.


Temporal differences between high vowels and glides are more robust than spatial differences

Burgdorf Dan Cameron; Tilsen Sam

Abstract It is often taken for granted that the glides [j] and [w] differ from their high-vowel counterparts [i] and [u] by having a greater degree of constriction and shorter duration. Yet there is little phonetic research to support these assumptions. Furthermore, although phonological patterns indicate that vowels and glides can be distinct phonemic categories, the extent to which their phonetic properties are categorically distinguished by speakers is unknown. We conducted an experiment to investigate articulatory and acoustic differences between productions of English vowels and glides at palatal and labio-velar places of articulation, in which participants imitated ambiguous intervocalic stimuli which varied in duration and intensity. The results showed that all speakers manipulated temporal aspects of their imitations of both palatal and labio-velar targets, while only some speakers manipulated constriction degree, and only for the labio-velars. This finding suggests that temporal organization may be more important than constriction degree for the glide-vowel distinction. Furthermore, temporal variables were often non-linearly related to stimulus duration, supporting the hypothesis that there exists a categorical difference in the temporal organization of articulatory gestures for vowels vs glides.


It’s alignment all the way down, but not all the way up: Speakers align on some features but not others within a dialogue

Ostrand Rachel; Chodroff Eleanor

Abstract During conversation, speakers modulate characteristics of their production to match their interlocutors’ characteristics. This behavior is known as alignment . Speakers align at many linguistic levels, including the syntactic, lexical, and phonetic levels. As a result, alignment is often treated as a unitary phenomenon, in which evidence of alignment on one feature is cast as alignment of the entire linguistic level. This experiment investigates whether alignment can occur at some levels but not others, and on some features but not others, within a given dialogue. Participants interacted with two experimenters with highly contrasting acoustic-phonetic and syntactic profiles. The experimenters each described sets of pictures using a consistent acoustic-phonetic and syntactic profile; the participants then described new pictures to each experimenter individually. Alignment was measured as the degree to which subjects matched their current listener’s speech (vs. their non-listener’s) on each of several individual acoustic-phonetic and syntactic features. Additionally, a holistic measure of phonetic alignment was assessed using 323 acoustic-phonetic features analyzed jointly in a machine learning classifier. Although participants did not align on several individual spectral-phonetic or syntactic features, they did align on individual temporal-phonetic features and as measured by the holistic acoustic-phonetic profile. Thus, alignment can simultaneously occur at some levels but not others within a given dialogue, and is not a single phenomenon but rather a constellation of loosely-related effects. These findings suggest that the mechanism underlying alignment is not a primitive, automatic priming mechanism but rather guided by communicative or social factors.


Acoustic unit discovery using transient and steady-state regions in speech and its applications

Pandia Karthik; Murthy Hema A.

Abstract Acoustic modelling in the absence of labelled audio is difficult in speech processing, especially in under-resourced languages. Ideas from theories of speech production and perception can aid acoustic modelling in such a setting. Several production and perception related studies have shown the importance of the dynamic nature of speech. In the present work, an attempt is made to discover and model the dynamic nature of the speech signal. Specifically, speech is modelled as a sequence of transient and steady-state units. Model initialisation, which is crucial for unsupervised acoustic modelling, is performed using the syllabic structure present in the speech signal. The proposed method has similarities with the distinctive region model (DRM) for speech production, where the dynamic regions are assumed to be contained within syllable-like segments. An analysis of the discovered units reveals that the units are of transient and steady-state forms. The steady-state units predominantly correspond to vowels. The transient units correspond to nasal, approximant, fricative, and stop transients. Finally, the effectiveness of the proposed method is explored by applying the acoustic units to zero-resource text-to-speech synthesis and unsupervised keyword spotting tasks.


Lexical representations can rapidly be updated in the early stages of second-language word learning

Llompart Miquel; Reinisch Eva

Abstract  Encoding second-language (L2) phonological contrasts into lexical representations is known to be challenging above and beyond the perceptual difficulties that these contrasts may entail. In two experiments, this study assessed the effect of form-focused training during word learning on the lexical encoding of the English /ɛ/-/æ/ contrast into novel L2 minimal pairs (e.g., tendek - tandek ) by German learners of English. More specifically, we investigated whether the point in time in which form-focused training is administered (i.e., very first presentation vs. after one training session) determines learners’ success at distinguishing the two vowels in the novel words. In Experiment 1, only native English tokens were presented whereas, in Experiment 2, productions by a fellow German-accented learner of English were also included. Results revealed an early benefit of phonologically-focused training on lexical encoding and novel word recognition (Experiment 1) that nonetheless appeared to be constrained by the demands of the task and the properties of the input presented (Experiment 2). Most importantly, however, learners’ ultimate word recognition performance provided evidence that early lexical representations can rapidly be updated to reflect improvements in phonological knowledge.


Sociophonetic variation in English /l/ in the child-directed speech of English-Malay bilinguals

   Sim Jasper Hong

Abstract Three realisations of syllable-final /l/ have been described in previous work on Singapore English: vocalised-l (or deleted-l in some phonetic contexts; the local norms), dark-l (a form associated with the exonormative standards), and clear-l (a Malay-derived phonetic trait observed in the speech of some English-Malay bilinguals). This study examined whether, how and why Singaporean English-Malay bilinguals vary their English /l/ in their child-directed speech, and whether the phonetic variation, if any, could be socially-conditioned. The laterals in the English child-directed speech of ten father-mother dyads with their preschoolers were analysed using auditory and acoustic methods. All participants were simultaneous or early English-Malay bilinguals. The findings revealed that in informal contexts, both mothers and fathers used a relatively clearer /l/ in all syllable positions. Contrastingly, in formal contexts that involved teaching and learning, the coda laterals of mothers were significantly darker, thereby exhibiting positional contrast between onset and coda laterals. They also produced significantly more vocalised-l in these contexts. Fathers, however, did not show differentiation in the darkness of the laterals, nor did their laterals show significant positional differences in formal contexts, although some fathers of younger children did produce more vocalised-l than they did in informal contexts. The variation observed was discussed by exploring the potential socio-indexical meanings of these variants of /l/ within the context of variationist accounts of Singapore English and by drawing parallels with socially-conditioned variation in bilectal monolinguals and ethnolect speakers. Differences between maternal and paternal CDS patterns could be attributed to gender roles and cultural expectations of mothers’ dominant role in child-rearing, and may also be a result of and enabled by Malay women’s potentially more complex repertoire range.


A non-contrastive cue in spontaneous imitation: Comparing mono- and bilingual imitators

Kwon Harim

Abstract  This study tests the hypothesis that imitators of different native languages imitate the same targets in distinct ways predicted by their native phonology, by investigating the role of a non-contrastive phonetic property in spontaneous imitation of English voiceless stops by English monolingual and Seoul Korean-English bilingual imitators. The primarily contrastive phonetic property for English voiceless stops is voice onset time (VOT), with the fundamental frequency (f0) of the post-stop vowel being non-contrastive but still informative for the voicing contrast. On the other hand, in Seoul Korean, stop VOT is a non-primary cue, but it is necessary to maintain the full three-way laryngeal contrast in the language. Post-stop f0 is the primary cue for the Seoul Korean aspirated stops. Seoul Korean speakers have been reported to imitate aspirated stops with longer VOT by raising their post-stop f0 (Kwon, 2019). In this study, English monolingual speakers and Seoul Korean-English bilingual speakers heard and shadowed model speech containing English voiceless stops manipulated by either raising post-stop f0 or lengthening VOT. Their imitation was assessed with two acoustic measurements, stop VOT and post-onset f0, of the voiceless stops, before and after the imitators heard the model speech with the two manipulations. A separate discrimination test confirmed that both manipulations were reliably perceived by both the monolingual and the bilingual imitators. English monolingual speakers' imitation data suggest that their shadowing productions reflect the phonological significance of the two phonetic properties, and only the imitative changes induced by a contrastive cue last beyond the immediate shadowing targets. In addition, Seoul Korean-English bilingual speakers, when performing the spontaneous imitation tasks in English, do not draw on their native (Seoul Korean) phonology. Implications of these findings on the role of phonology in the spontaneous imitation of bilingual and monolingual speakers are discussed.


Phonetic convergence to non-native speech: Acoustic and perceptual evidence

Wagner Mónica A.; Broersma Mirjam; McQueen James M.; Dhaene Sara; Lemhöfer Kristin

Abstract  While the tendency of speakers to align their speech to that of others acoustic-phonetically has been widely studied among native speakers, very few studies have examined whether natives phonetically converge to non-native speakers. Here we measured native Dutch speakers’ convergence to a non-native speaker with an unfamiliar accent in a novel non-interactive task. Furthermore, we assessed the role of participants’ perceptions of the non-native accent in their tendency to converge. In addition to a perceptual measure (AXB ratings), we examined convergence on different acoustic dimensions (e.g., vowel spectra, fricative CoG, speech rate, overall f0) to determine what dimensions, if any, speakers converge to. We further combined these two types of measures to discover what dimensions weighed in raters’ judgments of convergence. The results reveal overall convergence to our non-native speaker, as indexed by both perceptual and acoustic measures. However, the ratings suggest the stronger participants rated the non-native accent to be, the less likely they were to converge. Our findings add to the growing body of evidence that natives can phonetically converge to non-native speech, even without any apparent socio-communicative motivation to do so. We argue that our results are hard to integrate with a purely social view of convergence.


Leveraging the temporal dynamics of anticipatory vowel-to-vowel coarticulation in linguistic prediction: A statistical modeling approach

Flego Stefon; Forrest Jon

Abstract Previous research has shown that coarticulatory information in the signal orients listeners in spoken word recognition, and that articulatory and perceptual dynamics closely parallel one another. The current study uses statistical classification to test the power of time-varying anticipatory coarticulatory information present in the acoustic signal for predicting upcoming sounds in the speech stream. Bayesian mixed-effects multinomial logistic regression models were trained on several different representations of spectral variation present in V1 in order to predict the identity of V2 in naturally coarticulated transconsonantal V1…V2 sequences. Models trained on simple measures of spectral variation (e.g. formant measures taken at V1 midpoint) were compared with models trained on more sophisticated time-varying representations (e.g. the estimated coefficients of polynomial curves fit to whole formant trajectories of V1). Accuracy in predicting V2 was greater when models were trained on dynamic representations of spectral variation in V1, and those trained on quadratic and cubic polynomial representations achieved the greatest accuracy, with more than 15 percentage points in correct classification over using midpoint formant frequencies alone. The results demonstrate that spectral representations with high temporal resolution capture more disambiguating anticipatory information available in the signal than representations with lower temporal resolution.


“Hama”? Reduced pronunciations in non-native natural speech obstruct high-school students’ comprehension at lower processing levels

Wanrooij Karin; Raijmakers Maartje E.J.

Abstract  Native speakers ‘reduce’ their pronunciations, i.e., they shorten and merge words. For instance, German native speakers may say “hama” for “haben wir” (‘have-we’). We examined to what extent such reductions are problematic for adolescent learners of a second language, after four years of high-school training; and whether the problems can be related to inadequate bottom-up and top-down processing. For this, 39 Dutch and 38 German adolescents heard either reduced or unreduced German full phrases and part-phrases (phrase-intelligibility task) and words (lexical decision task). The results show that (1) Learners perceive non-native reduced speech less accurately than unreduced speech and also judge it as less intelligible; (2) This reduced-form disadvantage occurs separately from factors such as speech rate, orthography and voice; (3) The disadvantage for non-native listeners is substantial and larger than that in native listeners. Therefore, it probably reflects a lack of experience with reduced (i.e., real-life) speech; and (4) Non-native reductions induce at least inadequate bottom-up processing in learners, and may make top-down processing less accessible. We interpret the findings as supporting the idea that experience with variants (here: reduced variants) is necessary to strengthen linguistic (word) representations.


Systematic co-variation of monophthongs across speakers of New Zealand English

Brand James; Hay Jen; Clark Lynn; Watson Kevin; Sóskuthy Márton

Abstract The study of phonetic variation and change has tended to concentrate on particular variables in isolation, and it has proven challenging to move beyond an analysis of individual variables or small groups of variables, towards a better theoretical and empirical understanding of entire vowel systems. We develop a methodology that facilitates the study of co-variation, and introduce a large scale analysis of how elements of full sound systems co-vary across hundreds of speakers, demonstrating how constellations of vocalic variables operate together. Our data-set comprises F1 and F2 for 10 monophthongs of New Zealand English. We first obtain estimates of how advanced each speaker is with respect to changes in each of the vowels, irrespective of known predictors of sound change (i.e. year of birth, gender, speech rate). This is done by extracting by-speaker intercepts from Generalised Additive Models. We then use Principal Component Analysis on these intercepts to investigate the underlying structural co-variation that exists across the vocalic variables. Within a large subset of vowels, we see ‘leaders’ and ‘laggers’ of sound change; however, there are also groups of vowels which stand in opposition to each other, such that if a speaker is innovative in one, they tend to be conservative in the other. Some sets of covarying vowels could be linked by structural relationships (such as chain-shifting), but there are also covarying sets of vowels with no clear structural relationship, and which may be linked by shared social meaning. Our analysis provides novel insights into the structure of sound systems, demonstrating the existence of structured patterns in the realisations of specific vocalic variables across a large group of speakers. This approach offer a means to overcome long-standing methodological challenges in the study of phonetic co-variation, paving the way for research to move beyond the analysis of individual variables, towards an understanding of variation and co-variation in sound systems.


The role of L2 experience in L1 and L2 perception and production of voiceless stops by English learners of Spanish

Gorba Celia; Cebrian Juli

Abstract  Some previous studies report that increased experience with a second language (L2) may result in a more target-like perception and production in the L2, as well as in a less native-like performance in the L1. The present paper aimed to (1) assess the role of L2 experience on L2 and L1 production of voiceless stops; (2) investigate the effect of L2 experience on L2 and L1 perception of voiceless stops; and (3) examine the relationship between perception and production. Three groups of English learners of Spanish differing in amount and type of L2 experience, as well as two groups of functional monolinguals, completed a production task and an identification task involving English and Spanish voiceless stops. The results revealed that the L2 speakers were more successful at producing than at perceiving Spanish stops accurately, with L2 experience having a positive effect on production. L2 experience was not found to affect performance in the L1, which could be related to an overall limited amount of L2 use even in an immersion setting. The results also showed a weak relationship between perception and production, which may partly be due to the different nature of perceptual and production measures.


Coarticulation across morpheme boundaries: An ultrasound study of past-tense inflection in Scottish English

Mousikou Petroula; Strycharczuk Patrycja; Turk Alice; Scobbie James M.

Abstract  It has been hypothesized that morphologically-complex words are mentally stored in a decomposed form, often requiring online composition during processing. Morphologically-simple words can only be stored as a whole. The way a word is stored and retrieved is thought to influence its realization during speech production, so that when retrieval requires less time, the articulatory plan is executed faster. Faster articulatory execution could result in more coarticulation. Accordingly, we hypothesized that morphologically-simple words might be produced with more coarticulation than apparently homophonous morphologically-complex words, because the retrieval of monomorphemic forms is direct, in contrast to morphologically-complex ones, which might need to be composed online into full word forms. Using Ultrasound Tongue Imaging, we tested this hypothesis with nine speakers of Scottish English. Over two days of training, participants learned phonemically identical monomorphemic and morphologically-complex nonce words, while on the third consecutive testing day, they produced them in two prosodic contexts. Two types of articulatory analyses revealed no systematic differences in coarticulation between monomorphemic and morphologically-complex items, yet a few speakers did idiosyncratically produce some morphological effects on articulation. Our work contributes to our understanding of how morphologically complex words are stored and processed during speech production.


Contextually-relevant enhancement of non-native phonetic contrasts

 Kato Misaki; Baese-Berk Melissa M.

Abstract   One important factor that contributes to successful speech communication is an individual’s ability to speak more clearly when their listeners have difficulty understanding their speech. Though previous studies have demonstrated that native talkers implement acoustic–phonetic speech enhancements to ensure that their speech is understood by listeners, how non-native talkers employ goal-oriented enhancements is less well-understood. Here, we examine acoustic characteristics of speech enhancements produced by native and non-native English talkers of varying proficiency. Specifically, we investigate native Mandarin learners of English. The results show that non-native talkers’ ability to enhance a specific sound contrast differed depending on their familiarity with the target English contrast from their native language experience (Mandarin), as well as their English proficiency level. These results highlight that talkers are able to enhance their speech in native and non-native languages, but also suggest that this flexibility is shaped by the talkers’ target language proficiency and the type of acoustic manipulation involved in the adaptation.


Simultaneous bilingualism and speech style as predictors of variation in allophone production: Evidence from Finland-Swedish

Strandberg Janine A.E.; Gooskens Charlotte; Schüppert Anja

Abstract  This study investigates cross-linguistic transfer in the production of long mid front vowels [øː] and [œː] by simultaneous bilingual Finnish and Finland-Swedish speakers in Finland. In Swedish, the phoneme /ø/ can be realised as the allophones [ø] and [œ], while in Finnish, only [ø] is used. Combining approaches from sociophonetic and bilingual transfer research, the study used acoustic analysis to compare the height and fronting of [øː] and [œː] produced by bilingual and monolingual Finland-Swedish speakers in three different speech styles on a continuum of formality. The data from 115 participants are stratified according to language background, speech style, region, and age. The statistical analysis indicates increased overlap of [øː] and [œː] in the vowel spaces of bilingual speakers, particularly in informal speech. The results suggest a potential effect of Finnish transfer on the distinction of the phonetic variants in simultaneous Finland-Swedish bilinguals, as well as demonstrate the importance of considering speech style in bilingual transfer research.


The supralaryngeal articulation of stress and accent in Greek

Katsika Argyro; Tsai Karen

Abstract  It is well reported that articulatory movements comprising prominence units are longer, larger, and faster than their non-prominent counterparts. However, it is unclear whether these effects arise at the level of lexical stress, accent, or both, reflecting a hierarchy of prominence, i.e., being stronger when induced by accent as opposed to stress. It is also uncertain whether prominence-induced kinematic effects are invariant across positions of stress within the word, types of focus that accent denotes, and positions of words in the phrase. Here, we use an electromagnetic articulography (EMA) study to assess the supralaryngeal kinematic correlates of prominence in Greek across three stress positions (antepenultimate, penultimate, ultimate; i.e., all possible stress positions in Greek), two accentual conditions (accented and de-accented), and two phrasal positions (phrase-medial and phrase-final). Focus type is also considered, with the accentual conditions coming from two types of focus (broad and narrow), while the de-accented conditions are by default unfocused. Our results indicate that stressed syllables involve longer, larger, and faster gestures than their unstressed counterparts, regardless of the position of stress within the word. Notably, variation in velocity is accounted for by variation in displacement. Presence of accent does not further expand the stressed gestures, although it is related to minimal kinematic changes across the whole word, the exact profile of which depends on stress position. With the exception of final vowel duration, focus type is not systematically encoded in these kinematic effects. Finally, interactions are detected between the kinematic profile of prominence and that of boundaries. Implications of our findings for the hierarchy of prominence and its cross-linguistic differences are discussed, and a gestural account of prominence and boundaries is put forward.


期刊简介


The Journal of Phonetics publishes papers of an experimental or theoretical nature that deal with phonetic aspects of language and linguistic communication processes. Papers dealing with technological and/or pathological topics, or papers of an interdisciplinary nature are also suitable, provided that linguistic-phonetic principles underlie the work reported. Regular articles, review articles, and letters to the editor are published. Themed issues are also published, devoted entirely to a specific subject of interest within the field of phonetics.


官网地址:

https://www.sciencedirect.com/journal/journal-of-phonetics

本文来源:The Journal of Phonetics

点击文末“阅读原文”可跳转下载


往期推荐

刊讯|《汉语作为第二语言研究》2021年第2期


刊讯|SSCI 期刊《第二语言研究》2021年第4期


刊讯|SSCI 期刊《语言与社会互动》2021年第3期


刊讯|SSCI期刊《语料库语言学和语言学理论》2021年第3期


欢迎加入
“语言学心得交流分享群”“语言学考博/考研/保研交流群”


请添加“心得君”入群

今日小编:慧 伟

审     核:心得小蔓

转载&合作请联系

"心得君"

微信:xindejun_yyxxd

点击“阅读原文”可跳转下载

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存