CroissantLLM, the Franco-English LLM

The advent of CroissantLLM by the MICS laboratory at CentraleSupélec, in collaboration with several academic institutions, marks a significant turning point in the field of artificial intelligence. This model, available on Hugging Face, represents a major advancement towards autonomy and openness in language model technology. Its design and training, fully carried out in France, use the Jean Zay supercomputer, relying on French public datasets, covering a wide range of fields such as law, administration, culture, commerce, science, and translation.

Table des matières

Genesis and Objectives of CroissantLLM

Manuel Faysse, a key player in the project, emphasizes the ambition to make this model not only high-performing but also accessible. In an era where digital sovereignty is becoming crucial, CroissantLLM stands out for its openness and transparency, contrasting with other initiatives such as Llama 2 or Mistral AI. This LLM bilingual model, pre-trained on a colossal sum of documents in French and English, shines with its ability to seamlessly navigate between these two languages, thus offering a nuanced understanding of cultural and linguistic subtleties.

With its 1.3 billion parameters, CroissantLLM aims to be more compact than its giant predecessors, thereby promoting its large-scale adoption. This strategy is based on a relevant observation by Manuel Faysse: the most downloaded models are not necessarily the largest, but those that combine efficiency and ease of use. The Llama-type architecture adopted thus aims to optimize performance while guaranteeing simplified use on standard equipment.

Performance and Accessibility: The Bet of CroissantLLM

The ability to run on low-end GPU servers, even on CPUs and mobile devices, without compromising speed or quality, makes CroissantLLM particularly attractive. This accessibility, combined with moderate energy consumption, positions the model as an ideal solution for a variety of industrial and communication applications. However, it is important to temper expectations regarding complex reasoning or programming capabilities, as the model is optimized for specific tasks such as translation and conversation.

FrenchBench: The Performance Evaluation Tool

Researchers have developed FrenchBench, a benchmark dedicated to evaluating the performance of CroissantLLM in French. This device, targeting classification and generation tasks, allows for measuring the model’s effectiveness on key aspects of natural language processing. The results obtained demonstrate the superiority of CroissantLLM compared to other models of comparable size, particularly in reasoning, factual knowledge, and language skills.

Transparency and Future Development

The publication of source codes and multiple model configurations demonstrates the researchers’ commitment to transparency. Evaluation using the FMTI framework confirms the robustness of CroissantLLM, with a transparency criteria validation rate of 81%. This approach paves the way for future research on bilingual and multilingual models, as well as a better understanding of the impact of pre-training data.

Impact and Prospects of CroissantLLM

In conclusion, CroissantLLM embodies a major advancement in the landscape of language models, combining performance, accessibility, and transparency. Its development marks a significant step toward the democratization of artificial intelligence, thereby promoting wider adoption and better integration across various sectors. Researchers are already considering expanding the application spectrum of CroissantLLM, exploring avenues to enhance its capabilities and efficiency. This Franco-English model opens promising horizons for automatic language processing and intercultural communication, heralding a new era of innovation and collaboration in the field of artificial intelligence.