The Relaciones Geográficas refer to a wealth of questionnaires conducted in the four Virreinatos (Nueva España, Nueva Granada, Perú, and Río de la Plata) in response to a royal decree issued by King Phillip II of Spain. Initially consisting of just 50 questions – later extended to 255 in 1600 – these questionnaires compiled vast amounts of information organised into four major topics:

  • Nature

  • Morals and politics

  • Military organisation

  • Religious life and systems

Though directed to the Spanish religious and political authorities, in many cases indigenous authorities and elders also participated in the collection and writing of the relaciones, providing an invaluable insight into indigenous cultures, religions and cosmogonies.

The entire compilation of documents consists of thousands of pages, covering an array of details about both indigenous and colonial life in the Virreinatos. The lack of a standardized format throughout, alongside the mix of Old Spanish and indigenous languages, largely Nahuatl, make the relaciones a complex but rich corpus.

The Relaciones Geográficas therefore present both a challenge and an opportunity. Traditional research of these documents relied on the close reading of just a handful of these texts, taking scholars a lifetime to explore and analyse. Using a variety of ground-breaking computational techniques, this project will create unmatched opportunities for the exploration and study of these documents for future scholars, simultaneously developing and refining novel computational methodologies for the exploration and study of historical textual sources.


Our analysis of the Relaciones Geográficas will:

  • Create, through computational approaches, new understandings of the history of the Early Colonial period and the conquest of America.
  • Generate innovative methodologies for the identification, extraction, and cross-linking of information from historical sources, facilitating new forms of analysis and interpretation.
  • Enhance and develop the aforementioned techniques while addressing the challenges posed by macroscale approaches to textual collections of interest to Humanities fields.

In completing these objectives, our project will engage with a range of both historical and methodological research questions, encompassing both macro- and, where relevant, micro-scale analyses of the information contained within the Relaciones Geográficas.

Among many others, some of our research questions include:

  • What are the geographies contained in these textual collections? In an unprecedented effort to locate the places mentioned in the corpus, the project will use advanced GISc techniques for the disambiguation of uncertain places, and will map all locations possible, creating an invaluable geographic digital resource of the Relaciones Geográficas.
  • Who was involved in the collection of information for the Relaciones Geográficas? Mining the data related to the people and institutions that compiled the material associated to each town, the project will aim to reconstruct the knowledge network of people responsible for the creation of this corpus in New Spain during the 16th century.
  • What was the colonial infrastructure of New Spain? By extracting organisations mentioned in the corpus, we will provide a comprehensive picture of the educational, religious and economic institutions of the Mexican territories of New Spain.
  • How can we better bridge the divide between qualitative and quantitative research in the Humanities, through digital methods from the current state-of-the-art?
  • How can methods from Corpus Linguistics and statistical Natural Language Processing better deal with languages such as historical Spanish, for which there are fewer resources, and better deal with bilingual or even multilingual corpora?
  • How can NLP techniques be advanced and developed in order to enhance the identification and disambiguation of proper names, specifically plane names, in historical corpora?
  • What are the linguistic and computational challenges of dealing with Prehispanic languages, particularly Nahuatl, and how can we solve them?
  • How can we use innovative GIS and NLP techniques for the disambiguation of places in historical corpora?


Traditionally, in Humanities related fields such as History, Archaeology, Literature, and Anthropology, researchers can spend their careers scrutinising a handful of documents in order to find meaningful connections, answer new questions, and advance their field of research. Nevertheless, recent cutting-edge approaches -some of which have been developed under larger schemes of collaboration by members of this team- enable complex explorations of large textual collections which usually run in the order of millions of words. Our previous research has already provided examples on how the development of combined methods and theory from fields such as Computer Science, the Geographical Information Sciences, and the Humanities, can facilitate the exploration, identification, and analysis of linguistic, semantic, geographic, and historical patterns in textual corpora at different scales.

Our highly interdisciplinary team is combining techniques from different disciplines, including:

  • Corpus Linguistics
  • Text Mining
  • Natural Language Processing
  • Machine Learning
  • Geographic Information Systems.

This approach poses some interesting methodological challenges, ranging from dealing with a multilingual corpus (Old Spanish and indigenous languages including Nahuatl) with non-standardized spelling, to geographic uncertainty of places and place-name changes over time.

Overcoming these challenges will enable unprecedented interaction with the Relaciones Geográficas, greatly increasing the scope for further research and analysis of these, and other, valuable historical sources.