À propos de moi

Je suis doctorant en informatique à Sorbonne Université et à l’équipe de recherche ALMAnaCH à Inria.

Je travaille principalement en machine learning, deep learning et traitement automatique du langage naturel.

J’aime le café, les cookies et la programmation.

Intérêts

  • Machine Learning
  • Deep Learning
  • Text Mining
  • NLP

Formation

  • Doctorat en Informatique

    Sorbonne Université

  • Licence MIASHS, 2018

    Université Paris 8

  • Master en Mathématiques, 2017

    Aix-Marseille Université

  • Licence en Mathématiques, 2016

    Universidad Nacional de Colombia

Publications récentes

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

We explore the impact of the training corpus on contextualized word embeddings in five mid-resource languages.

CamemBERT: a Tasty French Language Model

We explore the impact of the training data size on a French version of RoBERTa.

Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell

We introduce the first treebank for a romanized user-generated content variety of Algerian, a North-African Arabic dialect.

Les modèles de langue contextuels Camembert pour le Français : impact de la taille et de l'hétérogénéité des données d'entrainement

We explore the impact of the training data size and heterogeneity on French language modeling.

Establishing a New State-of-the-Art for French Named Entity Recognition

We explore convert the NER annotations of the French TreeBank to a more user-friendly format and establish a new state of the art for French NER.

French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus

We investigate the impact of different types and size of training corpora on language models.

How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures

We explore the impact of the OCR quality on grobid-dictionaries models.

Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures

We propose a new pipeline to filter, clean and classify Common Crawl by language, we publish the final corpus under the name OSCAR.

Présentations récentes et à venir

Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures

We propose a new pipeline to filter, clean and classify Common Crawl by language, we publish the final corpus under the name OSCAR.

Preparing the Dictionnaire Universel for Automatic Enrichment

A talk about automatic enrichment of dictionaries.

Projets

*

BASNUM

Digitization and analysis of Basnage de Beauval’s Universal Dictionary: lexicography and scientific networks

CamemBERT

A state-of-the-art language model for French.

OSCAR

OSCAR or Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus.

Contactez moi