Lexibank Explained

Lexibank
Producer:Max Planck Institute for Evolutionary Anthropology
Country:Germany
Languages:English
Cost:Free
Disciplines:Linguistics, lexicography

Lexibank is a linguistics database managed by the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany.[1] The database consists of over 100 standardized wordlists (datasets) that are independently curated.[2]

Description

Lexibank datasets are presented in the Cross-Linguistic Data Format (CLDF).[3]

Phonological and lexical features are automatically computed in Lexibank.[2]

The datasets are publicly accessible and are archived at Zenodo[4] and are also publicly available on GitHub.[5] Lexibank is also part of the Cross-Linguistic Linked Data project. All of the datasets are released under the CC BY 4.0 license.

Applications of the database include historical linguistics and comparative phonology.

List of datasets

The following is a list of Lexibank (version 0.2) datasets as of 17 June 2022.[6]

ID Languages Zenodo Citation
aaleykusunda 5115947 Uday Raj Aaley and Timotheus A. Bodt (2020): New Kusunda data: A list of 250 concepts. Computer-Assisted Language Comparison in Practice 3.4 (08/04/2020), URL: https://calc.hypotheses.org/2414.
abrahammonpa 5115885 Abraham, Binny, Kara Sako, Elina Kinny, and Isapdaile Zeliang (2018): Sociolinguistic Research among Selected Groups in Western Arunachal Pradesh: Highlighting Monpa. Dallas: SIL International.
allenbai 5115649 Allen, Bryan (2007): Bai Dialect Survey. Dallas: SIL International.
backstromnorthernpakistan 5116054 Backstrom, Peter C. and Radloff, Carla F. (1992): Sociolinguistic Survey of Northern Pakistan, Volume 2. Languages of Northern Areas. Islamabad: National Institute of Pakistan Studies.
bantubvd 5115982 Simon Greenhill and Russell Gray, 2015. Bantu Basic Vocabulary Database.
bdpa 5116087 List, Johann-Mattis and Jelena Prokić. (2014). A benchmark database of phonetic alignments in historical linguistics and dialectology. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), 26 — 31 May 2014, Reykjavik. 288-294.
beidasinitic 5119295 Běijīng Dàxué 北京大学 (1964): Hànyǔ fāngyán cíhuì 汉语方言词汇 [Chinese dialect vocabularies]. Beijing: Wenzi Gaige.
birchallchapacuran 5119306 Birchall J, Dunn M, & Greenhill SJ. 2016. A Combined Comparative and Phylogenetic Analysis of the Chapacuran Language Family. International Journal of American Linguistics 82(3). 255–284.
blustaustronesian 5137392 Greenhill, SJ; Blust, R and Gray, RD (2008): The Austronesian Basic Vocabulary Database: From bioinformatics to lexomics. Evolutionary Bioinformatics. 4. 271-283.
bodtkhobwa 5119330 Bodt, Timotheus Adrianus and List, Johann-Mattis (2019): Testing the predictive strength of the comparative method: An ongoing experiment on unattested words in Western Kho-Bwa languages. Papers in Historical Phonology 4.1: 22-44.
bowernpny 5119341 Bowern, Claire, & Atkinson, Quentin. (2012). Computational Phylogenetics and the Internal Structure of Pama-Nyungan: Dataset [Data set]. Language.
cals 5121189 Mennecier, P., Nerbonne, J., Heyer, E., & Manni, F. (2016). A Central Asian Language Survey, Language Dynamics and Change, 6(1), 57-98.
carvalhopurus 5121195 de Carvalho, F. O. (2021): A comparative reconstruction of Proto-Purus (Arawakan) segmental phonology. IJAL. 87.1. 49-108.
castrosui 5121213 Castro, Andy and Pan, Xingwen (2015): Sui dialect research. SIL: Guiyang.
castroyi 5121214 Castro, Andy; Crook, Brian; Flaming, Royce (2010): A sociolinguistic survey of Kua-nsi and related Yi varieties in Heqing county, Yunnan province, China. SIL Electronic Survey Reports 2010-001. Dallas: SIL International.
castrozhuang 5121215 Castro, Andy; Hansen, Bruce (2010): Hongshui He Zhuang dialect intelligibility survey. Dallas: SIL International.
chaconarawakan 5118556 Chacon, Thiago C. (2017): Arawakan and Tukanoan contacts in Northwest Amazonia prehistory. PAPIA 27(2). 237-265.
chaconbaniwa 5118605 Chacon, T. C.; Gonçalves, A. G.; and da Silva, L. F (2019): A diversidade linguística Aruák no Alto Rio Negro em gravações da década de 1950 [The diversity of Arawakan languages from the upper Rio Negro in recordings from the 1950s]. Forma y Función, 32.2, 41-67.
chaconcolumbian 5118763 Chacon, Thiago C. (2017): Arawakan and Tukanoan contacts in Northwest Amazonia prehistory. PAPIA 27(2). 237-265.
chacontukanoan 5118723 T. Chacon. (2014). A revised proposal of Proto-Tukanoan consonants and Tukanoan family classification. Journal of American Linguistics 80.3, pp. 275–322.
chenhmongmien 5118744 Chén, Qíguāng 陳其光 (2012): Miáoyáo yǔwén 苗瑤语文 [Miao and Yao language]. Zhōngyāng Mínzú Dàxué 中央民族大学 [China Minzu University Press].
chindialectsurvey 5121280 Language and Social Development Organization (2019): Chin dialect data collection. Yangon: LSDO.
chingelong 5121324 Chin, Andy C. (2015): The Gelong Language in the Multilingual Hub of Hainan. Bulletin of Chinese Linguistics. 8. 140-156.
clarkkimmun 5121482 Clark, E. R. (2008). A phonological analysis and comparison of two Kim Mun varieties in Laos and Vietnam. Payap University: Chiang Mai.
clics1 5121530 List, Johann-Mattis, Thomas Mayer, Anselm Terhalle, and Matthias Urban (2014). CLICS: Database of Cross-Linguistic Colexifications. Marburg: Forschungszentrum Deutscher Sprachatlas (Version 1.0).
constenlachibchan 5121347 Umaña, Adolfo Constenla. 2005. ¿Existe relación genealógica entre las lenguas misumalpas y las chibchenses?. Estudios de Lingüística Chibcha.
davletshinaztecan 5121382 Davletshin, Albert (2012): Proto-Uto-Aztecans on their way to the Proto-Aztecan homeland: linguistic evidence. Journal of Language Relationship. 8. 1. 75-92.
deepadungpalaung 5121402 Deepadung, Sujaritlak; Buakaw, Supakit; and Rattanapitak, Ampica (2015): A lexical comparison of the Palaung dialects spoken in China, Myanmar, and Thailand. Mon-Khmer Studies 44. 19-38.
diacl 5121561 Carling, Gerd (ed.) 2017. Diachronic Atlas of Comparative Linguistics Online. Lund: Lund University. (URL: https://diacl.ht.lu.se/). Accessed on: 2019-02-07.
dravlex 5121580 Kolipakam, Vishnupriya, Michael Dunn, Fiona M. Jordan & Annemarie Verkerk. (2018). DravLex: A Dravidian lexical database. Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.
dunnaslian 5121613 Dunn, Michael, Nicole Kruspe, and Niclas Burenhult. 2013. "Time and Place in the Prehistory of the Aslian Languages." Human Biology 85: 383–400.
dunnielex 5121651 Dunn, Michael (2012): Indo-European Lexical Cognacy Database. Max Planck Institute for Psycholinguistics: Nijmegen.
duonglachi 5121663 Duong, Thu Hang and Nguyen, Thu Quynh and Nguyen, Van Loi (2021): The Language of the La Chí People in Bản Díu Commune, Xín Mần District, Hà Giang Province, Vietnam. In: Studies in the Anthropology of Language in Mainland Southeast Asia. Ed. by N. J. Enfield, Jack Sidnell, and Charles H. P. Zuckermann. University of Hawaii Press: Honolulu. 124-138
felekesemitic 5126691 Feleke, Tekabe Legesse (2021): Ethiosemitic languages: classifications and classification determinants. Ampersand. 2021.
galuciotupi 5121724 Galucio, Ana Vilacy, Meira, Sérgio, Birchall, Joshua, Moore, Denny, Gabas Júnior, Nilson, Drude, Sebastian, Storto, Luciana, Picanço, Gessiane, & Rodrigues, Carmen Reis. (2015). Genealogical relations and lexical distances within the Tupian linguistic family. Boletim do Museu Paraense Emílio Goeldi. Ciências Humanas, 10(2), 229-274.
gaotb 5121776 Gao, Tianjun (2020): Reconstruction and analysis of phylogenetic network on Tibeto-Burman languages in China. Journal of Chinese Linguistics, 48:1, 257-293.
gerarditupi 5127906 Ferraz Gerardi, Fabrício and Reichert, Stanislav (2020) The Tupí-Guaraní Language Family: A Phylogenetic Classification. To appear in Diachronica.
halenepal 5121540 Hale, Austin (1973): Clause, sentences, and discourse patterns in selected languages of Nepal. Kathmandu: Institute of Nepal and Asiatic Studies.
hantganbangime 5126441 Hantgan, Abbie and List, Johann-Mattis (2018): Bangime. Secret language, language isolate, or language island? Journal of Language Contact.
hattorijaponic 5126845 Hattori, S. (1973): Japanese dialects. In: Diachronic, areal and typological linguistics. Edited by H. M. Hoenigswald and R. H. Langacre. 368-400.
houchinese 5126858 Hóu, J. (2004): Xiàndài Hànyǔ fāngyán yīnkù 现代汉语方言音库 [Phonological database of Chinese dialects]. Shànghǎi: Shànghǎi Jiàoyù.
hsiuhmongmien 5126451 Hsiu, Andrew (2015): The classification of Na Meo, a Hmong-Mien language of Vietnam. Handout prepared for SEALS 25 (Chiang Mai, 2015/05/27-29).
hubercolumbian 5121219 Huber, R. Q. and Reed, R. B. 1992. Vocabulario comparativo: palabras selectas de lenguas indígenas de Colombia [Comparative vocabulary. Selected words from the indigenous languages of Columbia]. Santafé de Bogota: Associatión Instituto Lingüístico de Verano.
huntergatherer 5126741 Bowern, Claire, Patience Epps, Jane Hill, and Patrick McConvell. Hunter-Gatherer Language Database. https://huntergatherer.la.utexas.edu/ Accessed 2021-04-27.
ids 5126899 Key, Mary Ritchie & Comrie, Bernard (eds.) 2015. The Intercontinental Dictionary Series. Leipzig: Max Planck Institute for Evolutionary Anthropology.
ivanisuansu 5126966 Ivani, J. K. (2019): A first overview on Suansu, a Tibeto-Burman language from Northeastern India. Talk, held at the 29th conference of the Southeast Asian Linguistic Society (27-29 May, Tokyo). https://zenodo.org/record/3383006
johanssonsoundsymbolic 5127131 Erben Johansson, N., Anikin, A., Carling, G., & Holmer, A. (2020). The typology of sound symbolism: Defining macro-concepts via their semantic and phonetic features, Linguistic Typology, 24(2), 253-310.
joophonosemantic 5137230 Joo, I. (2020). Phonosemantic biases found in Leipzig-Jakarta lists of 66 languages. Linguistic Typology, 24(1), 1–12.
kesslersignificance 5127775 Kessler, B. (2001): The Significance of Wordlists. CSLI: Stanford.
kleinewillinghoeferbikwinjen 5127404 Kleinewillinghöfer, Ulrich (2015). Bikwin-Jen Group. https://www.blogs.uni-mainz.de/fb07-adamawa/adamawa-languages/bikwin-jen-group/. Accessed on: 2020-04-15.
kraftchadic 5121222 Kraft, Charles H. 1981. Chadic wordlists. Berlin: Dietrich Reimer.
leejaponic 5126801 Lee, Sean, Hasegawa, Toshikazu (2011). Bayesian phylogenetic analysis supports an agricultural origin of Japonic languages. Proceedings of the Royal Society B: Biological Sciences, 278(1725), 3662–3669.
leeainu 5126890 Lee Sean, Hasegawa Toshikazu (2013). Evolution of the Ainu Language in Space and Time. PLoS ONE 8(4): e62243.
bremerberta 5126757 Bremer, Nate D. (2016): A Sociolinguistic Survey of Six Berta Speech Varieties in Ethiopia. SIL Electronic Survey Reports 2016-007. Dallas: SIL International.
leekoreanic 5126904 Lee, Sean (2015). A Sketch of Language History in the Korean Peninsula. PLoS ONE 10(5): e0128448.
lieberherrkhobwa 5127687 Lieberherr, Ismail and Bodt, Timotheus Adrianus (2017): Sub-grouping Kho-Bwa based on shared core vocabulary. Himalayan Linguistics 16(2). 26-63. URL: https://escholarship.org/uc/item/4t27h5fg
lindseyende 5127829 Kate Lynn Lindsey and Bernard Comrie. 2020. Ende (Papua New Guinea) dictionary. In: Key, Mary Ritchie & Comrie, Bernard (eds.) The Intercontinental Dictionary Series. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://ids.clld.org/)
listsamplesize 5128050 List, Johann-Mattis (2014): Investigating the impact of sample size on cognate detection. Journal of Language Relationship. 11. 91-102.
liusinitic 5131413 Líu, L.; Wáng, H.; Bǎi, Y. (2007): Xiàndài Hànyǔ fāngyán héxīncí, tèzhēng cíjí 现代汉语方言核心词·特征词集 [Collection of basic vocabulary words and characteristic dialect words in modern Chinese dialects]. Nánjīng: Fènghuáng.
lundgrenomagoa 5128097 Lundgren, Olof (2020): A phonological reconstruction of Proto-Omagua-Kokama-Tupinambá. Master's thesis. Lund: Lund University.
mannburmish 5131419 Mann, Noel W. 1998. A phonological reconstruction of Proto Northern Burmic. (PhD Thesis).
marrisonnaga 5121317 Marrison, Geoffrey Edward (1967) : The classification of the Naga languages of North-East India. London: School of African and Oriental Sciences.
mcelhanonhuon 5127348 McElhanon, K.A. 1967. Preliminary Observations on Huon Peninsula Languages. Oceanic Linguistics. 6, 1-45.
mitterhoferbena 5121327 Mitterhofer, Bernadette. 2013. Lessons from a dialect survey of Bena: Analyzing wordlists. SIL International.
naganorgyalrongic 5126458 Nagano, Yasuhiko and Prins, Marielle (2013): rGyalrongic Languages Database. Osaka: National Museum of Ethnology.
nagarajakhasian 5131421 Nagaraja KS, Sidwell P & Greenhill SJ. 2013. A Lexicostatistical Study of the Khasian Languages: Khasi, Pnar, Lyngngam, and War. Mon-Khmer Studies Journal, 42, 1-11.
northeuralex 5121268 Dellert, J., Daneyko, T., Münch, A. et al (2020). NorthEuraLex (Version 0.9). Lang Resources and Evaluation.
peirosaustroasiatic 5127536 Peiros, I. I. (2004): Генетическая классификация австроазиатских языков / Genetičeskaja klassifikacija avstroaziatskix jazykov [Genetic classification of Austro-Asiatic languages]. Russian State University for the Humanities, Russian State University for the Humanities, Moscow.
pharaocoracholaztecan 5136882 Pharao Hansen, Magnus (2020): ¿Familia o vecinos? Investigando la relación entre el proto-náhuatl y el proto-corachol [Family or neighbors? Investigating the relation between Proto-Náhuatl and Proto-Corachol]. In: Lenguas yutoaztecas: historia, estructuras y contacto lingüístico. Homenaje a Karen Dakin. Rosa Yañez (ed.) Guadalajara: Universidad de Guadalajara.
polyglottaafricana 5136890 Koelle, Sigismund W. (1854). Polyglotta Africana or Comparative Vocabulary of Nearly Three Hundred Words and Phrases in more than One Hundred Distinct African Languages. London: Church Missionary House.
ratcliffearabic 5136898 Ratcliffe, Robert R. (2021): The glottometrics of Arabic. Language Dynamics and Change. 2021.
robinsonap 5121340 Robinson, Laura C. and Holton, Gary (2012): Internal Classification of the Alor-Pantar Language Family Using Computational Methods Applied to the Lexicon. Language Dynamics and Change 2.2. 123-149.
saenkoromance 5136900 Saenko, M. (2015): Annotated Swadesh wordlists for the Romance group (Indo-European family). In: Starostin GS, editor. The Global Lexicostatistical Database. RGU; 2015. http://starling.rinet.ru/new100/tuj.xls
sagartst 5121409 Laurent Sagart, Jacques, Guillaume, Yunfan Lai, and Johann-Mattis List (2019): Sino-Tibetan Database of Lexical Cognates. Jena: Max Planck Institute for the Science of Human History.
satterthwaitetb 5136997 Satterthwaite-Phillips, Damian (2011) Phylogenetic inference of the Tibeto-Burman languages or on the usefuseful of lexicostatistics (and "megalo"-comparison) for the subgrouping of Tibeto-Burman. Stanford: Stanford University.
savelyevturkic 5137274 Savelyev, Alexander and Robbeets, Martine (2020): Bayesian phylolinguistics infers the internal structure and the time-depth of the Turkic language family. Journal of Language Evolution 5.1. 39-53.
servamalagasy 5137040 Serva M., Pasquini M. (2020): Dialects of Madagascar, PLoS ONE 15(10).
sidwellbahnaric 5137055 Sidwell, Paul. 2015. Austroasiatic dataset for phylogenetic analysis: 2015 version. Mon-Khmer Studies (Notes, Reviews, Data-Papers) 44. lxviii-ccclvii.
simsrma 5166593 Sims, Nathanial A. (2020): Reconsidering the diachrony of tone in Rma. Journal of the Southeast Asian Linguistics Society 13.1. 53-85.
sohartmannchin 5121813 So-Hartmann, Helga (1988): Notes on the Southern Chin Languages. Linguistics of the Tibeto-Burman Area 11.2: 98-119.
starostinpie 5137281 Starostin, S. A. (2005): Indo-European files in DBF/VAR. Moscow.
suntb 5121515 Sūn, Hóngkāi 孙宏开 (1991): Zangmianyu yuyin he cihui 藏缅语音和词汇 [Tibeto-Burman phonology and lexicon]. Beijing: Chinese Social Sciences Press.
syrjaenenuralic 5137236 Syrjänen, K.; Honkola, T.; Korhonen, K.; Lehtinen, J.; Vesakoski, O. & Wahlber, N. Shedding more light on language classification using basic vocabularies and phylogenetic methods. Diachronica, 2013, 30, 323-352
tls 5121819 Nurse, Derek and Gérard Philippson (1975). The Tanzanian Language Survey. Department of Foreign Languages and Linguistics of the University of Dar es Salaam: Dar es Salaam.
tppsr
transnewguineaorg 5141620 Greenhill, Simon J. (2015): TransNewGuinea.org: An Online Database of New Guinea Languages. PLoS ONE 10.10: e0141563.
tuled Ferraz Gerardi, Fabrício & Reichert, Stanislav & Aragon, Carolina. (2021) TuLeD (Tupían Lexical Database): Introducing a database of a South American language family. Language Resources and Evaluation.
visserkalamang 5139559 Eline Visser. 2021. Kalamang dictionary. In: Key, Mary Ritchie & Comrie, Bernard (eds.) The Intercontinental Dictionary Series. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://ids.clld.org/)
walworthpolynesian 5126932 Walworth, Mary. (2018). Polynesian Segmented Data (Version 1) [Data set]. Zenodo.
wangbai 5137407 Wang, Feng (2004): Language contact and language comparison. The case of Bai. PhD thesis. Hong Kong: City University of Hong Kong.
wangbcd 5136930 Wang, F. 2004. BCD: basic words of Chinese dialects. Unpublished dataset. [Digital version in: List, J.-M. (2015): Network perspectives on Chinese dialect history. ''Bulletin of Chinese Linguistics'' 8. 42-67.]
wichmannmixezoquean 5126948 Cysouw, M., Wichmann, S., & Kamholz, D. (2006). A critique of the separation base method for genealogical subgrouping, with data from Mixe-Zoquean. Journal of Quantitative Linguistics, 13(2-3), 225–264.
wold 5139859 Haspelmath, Martin & Tadmor, Uri (eds.) 2009. World Loanword Database. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://wold.clld.org/)
yanglalo 5121829 Yang, Cathryn (2011): Lalo regional varieties: Phylogeny, dialectometry and sociolinguistics. Bundoora: La Trobe University.
yangyi 5167277 Yang, Cathryn (2021): The phonetic tone change *high > rising: Evidence from the Ngwi dialect laboratory.
yuchinese 5139881 Hsiao-jung Yu and Yifan Wang. 2021. Mandarin Chinese dictionary. In: Key, Mary Ritchie & Comrie, Bernard (eds.) The Intercontinental Dictionary Series. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at https://ids.clld.org/)
zgraggenmadang 5121535 Z'graggen, J A. (1980) A comparative word list of the Northern Adelbert Range Languages, Madang Province, Papua New Guinea. Canberra: Pacific Linguistics.
zhaobai 5136947 Zhao, Yanzhen (2006): Zhàozhuāng Báiyǔ miáoxiě yánjiū 趙莊白語描寫研究 [Investigations of Zhaozhuang Bai]. Běijīng: Zhōngyāng Mínzú Dàxué.
zhivlovobugrian 5137439 Zhivlov, M. (2011): Annotated Swadesh wordlists for the Ob-Ugrian group (Uralic family). The Global Lexicostatistical Database. Moscow: RGGU.
zhoubizic 5140129 Zhou, Yulou (2020): Proto-Bizic. A study of Tujia historical phonology. Bachelor Thesis. Stanford University.
logos 5141379 List, Johann-Mattis, Thomas Mayer, Anselm Terhalle, and Matthias Urban (2014). CLICS: Database of Cross-Linguistic Colexifications. Marburg: Forschungszentrum Deutscher Sprachatlas (Version 1.0).
utoaztecan 5173799 Greenhill, Simon J., Hannah J. Haynie, Robert M. Ross, Angela M. Chira, List, Johann-Mattis, Lyle Campbell, Carlos A. Botero, and Russell D. Gray (2021): A recent northern origin for the Uto-Aztecan language family. Leipzig: Max Planck Institute for Evolutionary Anthropology.
abvdoceanic 5206553 Greenhill, S.J., Blust. R, & Gray, R.D. (2008). The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics, 4:271-283.

Bibliography

External links

Notes and References

  1. Web site: Shedding light on linguistic diversity and its evolution: Linguists and computer scientists collaborate to publish a large global Open Access lexical database . ScienceDaily . 2022-07-22 . 2022-07-22.
  2. List . Johann-Mattis . Forkel . Robert . Greenhill . Simon J. . Rzymski . Christoph . Englisch . Johannes . Gray . Russell D. . Lexibank, a public repository of standardized wordlists with computed phonological and lexical features . Scientific Data . 9 . 1 . 2022-06-16 . 2052-4463 . 10.1038/s41597-022-01432-0 . 1–16 . 239629792 . free . 9203750 .
  3. Forkel, R. et al. Cross-Linguistic Data Formats, advancing data sharing and reuse in comparative linguistics. Sci. Data. 5:180205 (2018).
  4. Book: Forkel . Robert . Greenhill . Simon J. . Rzymski . Christoph . Englisch . Johannes . Gray . Russell D. . Lexibank: A publicly available repository of standardized lexical datasets with automatically computed phonological and lexical features for more than 2000 language varieties . 2021 . 10.5281/ZENODO.5227817 . 2022-06-17.
  5. Web site: lexibank . GitHub . 2022-06-17.
  6. Web site: lexibank-analysed/lexibank.csv (v0.2) . GitHub . 2022-06-17.