A digital representation of how large language models function in AI technology.

Large Language Models Are Clueless About…Language. 

We Used Explicit Linguistic Theory to Teach Our Bots Grammar

LLMs are astonishingly good at using language, but surprisingly unreliable at explaining it. That tension sits at the center of this post — and at the center of why we built FreeFlow the way we did.

Most modern LLMs have two core abilities. First and foremost, they generate natural language fluently. Second, as a byproduct of being trained on virtually the entire internet, they dispense “knowledge” and perform a kind of loose logical reasoning. Commercial AI companies tend to market this second ability – there’s clearly a lot of money in automating knowledge work – but it’s fundamentally weaker than the model’s core eponymous skill: producing fluent language.

This distinction becomes especially important in language learning. LLMs never misspell words or misuse tense or agreement. So it’s easy to assume that if they’re so good at producing correct language, they must also be good at explaining the rules behind that correctness. But that assumption breaks down immediately in practice.

Here’s a real example from our testing of OpenAI’s GPT-4o, one of the best reasoning models available today. A student wrote “las habilidades buenas,” and the model correctly flagged it as unnatural. But then, in its explanation, it confidently told the student that “In Spanish, adjectives usually come after the noun.” Of course, that’s exactly what the student already did — and in this specific construction, the correct form actually requires the adjective to come before the noun.

Mistakes like this aren’t rare. They’re systemic. And they reveal something fundamental about how these models work: LLMs can recognize linguistic patterns but often can’t justify them. In a way, they resemble native speakers with no formal training in linguistics. If you’re a monolingual English speaker, you know that “two apples” sounds right and “two informations” doesn’t, but you’d probably struggle to articulate the rule behind that intuition.

For language learners, though, the explanation matters. Research in Second Language Acquisition suggests fluency requires both naturalistic immersive practice and explicit rule-based instruction. Learners need to develop intuition, yes — but they also need accurate, digestible guidance about why something is wrong.

This is exactly why we built FreeFlow the way we did: it’s a no-pressure practice tool augmented with real-time explicit grammar feedback. We use LLMs for what they’re genuinely best at: simulating rich, natural conversation so students can practice communicating the way they ultimately will with real native speakers. But when it comes to grammar instruction, we rely on our own custom-built system.

Borrowing from classical natural language processing (NLP), our grammar feedback engine uses a dependency parser to categorize errors according to formal linguistic theory. Instead of hallucinating an explanation after the fact, we identify the underlying grammatical relationship the student got wrong — adjective placement, gender agreement, verb tense/aspect/mood, and so on — and generate a hint grounded in expert pedagogy that nudges them toward the rule without simply giving away the answer.

The result is a kind of division of labor: the LLM handles immersion; our parser handles the “knowledge”. Students get the best of both worlds — natural (and fun!) conversation and reliable grammar guidance that actually helps them learn.

Leave a Comment

Your email address will not be published. Required fields are marked *

Contact Us To Sign Up