Current Projects

AlegroIn this Spanish funded project, we are developing an online learning system for English grammar. The content of the system was developed using an extensive learner corpus to discover which grammatical problems are most critical for Spanish learners of English. In use, the system firstly identifies which of these critical issues still affect the particular learner, and then focuses their attention on these issues. The system identifies the Zone of proximal Development of the learner and focuses on these issues.
Open University cooperationStarting in 2014, Course E304 from the Open University has been using my software, UAM Corpustool for the practial side of the course. The OU funded redevelopment of UAM Corpustool to be browser-based, and each year funds maintenance of the software.

Past Projects

TreacleThe Treacle project was funded by the Spanish government from 2009-2013. The goal of the project was to annotate a million word corpus of texts written by Univeristy-level Spanish learners of English, to explore developmental processes in these learners.
  • A 116,000 word sub-corpus was manually annotated for errors to identify the most critical problems of these learners, providing 16,000 errors.
  • The entire million word corpus was syntactically analysed and used for various studies to track syntactic development over rising proficiency levels (each text is associated with CEFR level of the student).
MT Project "Sistema de Gestión de Publicaciones Técnicas en Eurotech" (FIT-350101-2005-19)
Period: Jan 2006-Feb 2007
A Spanish national project proposal under the PROFIT call, in cooperation with Seinet and Eurotech. The role of UAM was to integrate automatic translation software into the content management system of Seinet. However, due to the lack of affordable MT software which can act in a server capacity, we have started the development of our own MT system. 
S5T Personalised Semantic Search of Text and Speech in the Semantic Web.
Period: 2006-2009
A Spanish national project proposal under the Programa Nacional de Tecnologías Informáticas. proposal coordinated by Pablo Castels.
WOSLAC
Las interfaces léxico-sintaxis y discurso-sintaxis: Factores sintácticos y pragmáticos en la adquisición del orden de palabras en inglés y en español como segundas lenguas. (HUM2005-01728/FILO)
The lexicon-syntax and discourse-syntax interfaces: Syntactic and pragmatic factors in the acquisition of L2 English and L2 Spanish
Period: 2006-09
A Spanish national project proposal, coordinated by Amaya MENDIKOETXEA PELAYO (Filología Inglesa). My role is the development of my corpus annotation software, UAM CorpuTool, to allow markup of marked wordorder phenomena in student essays.
Ramon y Cajal Contract I am employed under a Ramon y Cajal programme where awardees are to carry out a research project over 5 years. The project of my proposal involves the construction of a system to (semi) automatically translate consumer product instructions (e.g., "How to use your new iron") between English and Spanish.
AcaMed I was contracted by Language & Computing nv. to manage a team of researchers developing natural language understanding capabilities for texts in the medical domain. This is mainly funded under a Flemish Government Research Grant for the ACAMED-project. ACAMED is a generic tool for automatic ICD-code extraction from medical free texts.
M-Piro (Multilingual Personalised Information Objects) (European Consortium) Project to deliver a multilingual version of ILEX-type system. I worked on the project in the first year, to extend ILEX's multilingual capabilities.
HIPS (Hypernavigation in Physical Space) (European Consortium, with Chris Mellish, Jon Oberlander and Marc Moens) Developing handheld devices to deliver dynamically adapted presentations as the visitor walks through an exhibition space. For more information on this project (papers, etc.), click here.
ILEX (Intelligent Label Explorer) (with Chris Mellish, Jon Oberlander and Alistair Knott) seeks to automatically generate labels for items in an electronic catalogue (or museum gallery) in such a way as to reflect the interest of the user and also opportunistically to further certain educational (or other) aims. This project is in collaboration with the National Museums of Scotland, Interactive Information and VIS Interactive Media. For more information on this project (papers, etc.), and to access our prototype system, click here.
DIALOG (1991-1992) The DIALOG project was a project at University of Sydney with funding from Telecom Australia (now Telstra). The goal was to develop a model of dialog for use in an automated information service. I developed the conversation model.
EDA (1991-1992) The EDA project (Elecronic Discourse Analyser) was a project funded by Fujitsu (Japan), to develop a system for analysing computer manuals which had been translated from Japanese to English, and critiquing the translation at a discourse level. I designed and implemented the parser which was used.
Penman (1990) The Penman project was a long running project at the Information Sciences Institute, Los Angeles. I was the resident linguist for 1990, and worked with Bob Kasper in developing the ISI parser.