in

FEL XVII_2021: Retos y aplicaciones del Corpus Mexicano de Estudiantes MexLeC

Right so thank you all very much for coming to our presentation i am pauline moore and my colleague here is ana abigail flores hernandez we are both with the faculty of languages in the university autonomous del estado de mexico i am a full-time teacher and abby is doing her post-doctoral research as part

Of a conicet funded project and this project is aims to design a learner corpus and we called it the mexican learner corpus and today what we’re going to talk to you about with just finishing up our first year with this project we’re hoping to get finance for

A second year and we’re going to talk to you about the challenges and the applications of what we’ve been doing because the intention here is to produce a learner corpus of mexican learners which is um open for use by all other researchers so thank you again for coming

And just going to give you a quick outline of what we plan to talk about today and we’re going to give you a quick introduction where we’re going to talk about what is a corpus how they work and what to learn a corpus then we’re going to give you much more information

About the actual project the mexlek project which is the mexican learner corpus project i’m going to talk a bit about what our data look like and the kind of applications that you can use this data for and then we’re going to talk about the challenges and the questions of accessibility that we’ve

Been working with so just getting straight into this i’m going to talk about um what is an actual corpus so corpus is a very very large collection of text that text can be either written language or it can be oral language so for example the coca the corpus of contemporary american english includes

Mostly written text but it also includes some oral text the bnc is mostly oral text um and the real academy espanola has actually two corpus one is a synchronic a recent corpus which is not the chordae it’s the crea and the crea is mostly again mostly written text

There are some oral texts there um they uh they’re always it’s a very very large amount of text which is stored in a database uh usually an electronic database and the fact that it’s in an electronic database makes it accessible for machine reading and then there are special machines called concordancers

Which we use to go into the data and to find like frequency information or information about the way in which words pattern together the way in which words work together learner corpora are slightly different because what they are are collections of text which have been produced by language learners and

Examples of those are the iklai which for example is from the university of luvan and also the trinity lancaster corpus and that is a collection of the the um transcriptions of interviews from trinity exams and that’s been put together by lancaster and there’s online access to it and uh very very

Interesting now what are learner corpora actually useful for because they’re slightly different from general corpus and what they’re mostly used for are two separate but slightly related functions so on the one hand they contribute a lot to second language acquisition theory because they show us a very very clear

And authentic representation of the way in which learners actually use language and then the second way in which learner corpora are used are they they’re used to develop pedagogical tools and didactic materials and those materials should reflect more clearly the language learners really use the language

Okay uh so as i said the first kind of application is in second language acquisition research and this is from a recent these are a couple of screenshots just from a recent congress on corpus use and language and you can see the ideas that the people are working with

Phrasal verbs they’re working with different variables it is quite quantitative the work that’s mostly done in corpora um then there’s here there’s a question of the effect of the task itself on how fluent the writers are and there’s questions of writing complexity work with collocations here with these

Light verbs do have make give and take and we’ll see a couple of examples of that in the data that we look at so that’s one way again in which we use learner corpora and a second way is that we use learner corpora to design materials so here’s a book by cambridge

Articulo Recomendado
APLIQUE A LA UNMSM PORQUE NO HAY EXAMEN DE ADMISIÓN PLATA UNMSM 2024-II

English and it has been informed by the data from the cambridge english corpus and here this uh little exercise is an error correction exercise and what they’ve done is they’ve taken actual usage from the corpus and they have polished it up a little bit but they

Still left the mistake in and the idea is that the students will be correcting mistakes which are much more like the mistakes that the students actually make well now i’d like to talk a bit about this project mexican learner corpus well kathryn the data we need for sla

Research is a time consuming process so we have to create we have to design we have to test our instruments we have to do the transcriptions we have to apply the interviews or train the interviewers and we have to classify all this information we gather

And if we could save this time by having a reliable and big enough source of data for sla research and nuts uh we could be able to save all this time to go further in our analysis and in our research interests well second most of the corpora available are synchronic and written

They collect information from learners only once and i think it’s important to have a collection which we have production of learners in different moments so we can really track the uh development and the acquisition of language um and having a more spontaneous data as spoken production uh

It is useful because we can really access their grammar and their vocabulary they have no opportunity to prepare what they are producing and uh finally it is important to mention that we of course have important learner corporate and vegan health learner corporate and reliable under corporate estrogen and gaster corpus or

Cambridge chickens foreign or italy but they are not open access fortunately and you have to pay to use them or sometimes you don’t have access because they are for private use so that’s why in mexican learner purpose the aim is an open access reliable focus multi-level and longitudinal corpus of mexican learners

Of english of materials for elc and research is sla and we have decided to collect this information from all the graduate learners from first to each semester that covers the levels a1 and to use a semi-guided interview we designed er that is about 10 to 15 minutes

According to the level of the learner now i’m going to talk about the rationale behind the interview we designed for mexlegg uh these are three steps we follow first we analyze 129 english learner coopera and we evaluated their aims their tasks the data they produced and the application of these

Data for elt and sla research and we decided we wanted to have spontaneous data so we created a neural interview and we wanted a minimum participation of the interviewer so we could have long term that are easier to transcribe and also to analyze and we wanted to use general topics this could be

Silly comparable and we could easily replicate this research with any other and uh to decide the timing of each task uh we went to the uh timing views common contaminations in mexico that are cambridge eso trinity dc and toefl and we have an idea of what was the suitable

Time for each of our tasks and we also took into account the common european framework to decide which of the tasks and questions were suitable for each level of the participants and learners so they feel free and fluent when they were producing spoken this course this is the result of the previous

Process this is mexlek interview this is 10 to 15 minutes more or less according to the level of the learner and it’s got four parts uh part one you are going to find uh general questions like what do you do tell me about your friends or tell me about your family and these

Type of questions are suitable for the levels a1 and higher and then we go to the part two they have to answer a question in which they have a decision but it’s a simple position like um when you have some free time do you prefer staying at home or going out and

This is suitable for the levels a1 and higher at part three they have a narrative so they have a sequence of images your ease and they have to create a story use and they have more or less and this is readable for the levels a2 and higher

Articulo Recomendado
2. Informe de labores de la Vicerrectoría de la Región de Veracruz

And finally we’ve got part four that we haven’t applied because we don’t have this level yet and these are argumentative questions which you have to set your point and support your answer this is an example uh did you see mobile devices have destroyed communication or they have made it and we have one

Response and this is suitable since level v1 and higher additionally created a guide for the interviewer we created a guide for the participants and we created a questionnaire to make a learner profile in which we’re going to find questions about um age and um uh information about previous

English language and all the languages they are learning and more stuff thank you now to talk a bit more about the kind of data that we’re getting from the this interview procedure uh what i want you to look at here is the um the these are very very short extracts from uh

From from responses from students who are a1 and a2 to the decision task which is part two of the interview and the question there was do you think it’s better to study or to work and i think for example here we can see this very interesting use of vocabulary so

The a1 student here is saying that they think it’s important to study because if you have a degree it’s easier for you to find a job uh i’m not always sure that you find a job so i don’t know really at this time but for me it’s more important

To study and so they’re not really you know they’re being noncommittal and i think what you can notice there is that this is fairly fluent use of language uh there’s not in a huge amount of a very good great range of vocabulary but it is uh well employed and that very

Interesting use there of degree which is the perfect word for licenciatura whereas the a2 student is using a spanish language calc here you’re saying that that you need a title uh which is not exactly the right word and uh remember we talked a bit about calling collocations earlier and so we have

Instead of i will make money the student says i will do money so it’s that kind of tiny error which is very easy to see in a corpus because you have the written representation of the transcription um and very very authentic again use of language other types of data for example

Here you are going to find all the words they are producing in this section of the corpus listed by frequency and as you can see the word i is the most frequent in this section of the corpus which is really common in any corpus but what i found

Interesting is the use of interjections we are going to find three interjections in the first places of frequency what can give us an idea of the type of discord the learners are producing they are producing a subcost speech with not many content words and with lots of hesitation well now using the two

Lungs box developed by the university of lancaster i looked for all the words they are producing after the phrase i like and as you can see at your right um they say i like games i like classes i like um online i like video and i like

English and i cannot find any other content word here and if you go to your left then you’re going to find the verbs that you’re using after i like i like to play i like to have i like to travel and we could use this information to create

Some material to teach vocabulary to our students to teach another uh type of activities we can use after i like like i like going out i like um eating out or to teach them we can use also the verb with ing form and after i like i like playing i like

Listening i like watching or we can use them to teach another ways to say i like like i fancy i mean to i enjoy et cetera okay uh these last two samples are samples from pre-a1 learners they are answering the question study or work and as you can see they contained a lot

Of process these thoughts are processed and some parts we didn’t understand in transcription and we are they are using uh interjections and isolated words mainly and in this first sample what i found interesting is the use of students instead of study and in the second example what i found

Articulo Recomendado
Tesis sobre “Fractalidad de los Andes bolivianos” 16/11/2021.

Interesting is the way the learner is trying to answer this um question and instead of using isolated word like the first one he or she that suddenly she or he remembered so yeah and we started a bit to talk about the many challenges that we have faced

During this year and i think one of the important things to point out is to talk about the material selection just simply the material selection that we were using for that narrative phrase that took quite a while to find a suitable set of pictures which were able to set

Up a narrative that weren’t too complicated that would be easy to show on screen and for the students to be able to look at and create that narrative from and also that wouldn’t fall into maybe some racist or sexist stereotypes and another challenge is the sheer volume of

Work that we have to do um getting students interested in participating is much more difficult than than you might perhaps think and we do need a lot of data because we do want a really large corpus at the end of day so um last year for example we lifted data from 150

Students but this coming year that we have to lift those 150 students data again and add another 150 for students that are coming in fresh to the faculty and so that is an enormous amount of work very complex great complexities in the questions of interviewer trainer training

Because it is people are used to uh doing very rigid uh slightly different kinds of exams to have an actual script and we don’t we don’t provide a script but uh we also then with that we get into the trouble where you go to all of

The cost and all of the investment of training someone as an interviewer and at the end of the day they say you know what i can’t at that time so you know i did that that does get very complicated and um so a lot of the interviews we’ve

Actually ended up uh abby has ended up doing herself and i can’t do them because i am a native speaker and that would create an actual problem there in the interaction it would make them in very difficult to compare between the different interviews uh other many many other challenges in the transcription we

Were showing you there that transcription where you have to show how like what pauses there are how they’re filling it in we have to try to tag as much as possible in the transcription a huge challenge is the decision of how we’re going to treat errors that the students are

Making not so much phonetic errors because we took the decision from the very beginning that we weren’t going to use it phonetic transcription that we that we weren’t going to go that because that is much too time consuming we hope that in the future we’ll come to a point

Where we will have enough time to make phonetic transcriptions for all of the interviews and that would be helpful in another way and uh then we also have to be careful to redact all of the personal data from the transcription so if they’re using their name in the transcription we have to be

Able to redact that out so that’s again is another load on top of the transcription very very complicated i’m not sure if you know this but for every minute of oral language that you transcribe it takes between five and ten minutes to finish that transcription properly between the actual first level first

Draft of the transcription itself but also moving then on into other people revising those drafts and so there’s a lot there and when we finally do which we hope to do the video recording adaptation which will help people who are working perhaps in the study of

Gesture in here at the bottom then this is the site this is the actual website for maxlet course if you click on it on your mobile this is the kind of screen that you’ll come into but you can also use it obviously on a pc or in a tablet

What do you think?

Written by d2jma

Deja una respuesta

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *

GIPHY App Key not set. Please check settings

Top 10 Universidades Públicas de México 2022

Interessi e scelta della Scuola – Dalle scienze al sociale