Recent advancements in artificial intelligence (AI), particularly in Large Language Models (LLMs), have demonstrated notable achievements across various domains, including natural language processing and mathematical reasoning. However, questions arise regarding the reliability of these models in performing genuine logical reasoning tasks:
… most LLMs still struggle with logical reasoning. While they may perform well on classic problems, their success largely depends on recognizing superficial patterns with strong token bias, thereby raising concerns about their actual reasoning and generalization abilities.
Jiang et al, 2024.
'Token bias' in AI refers to the unintended favouring or disfavouring of certain words, phrases, or tokens during the training or operation of the model. Examples of this are 'sex bias' (e.g., assuming a doctor is male but a nurse female) or 'cultural bias' (e.g., assuming Irish people are drunkards).
What this and a recent study by AI researchers at Apple shows is that small changes in input tokens can drastically alter model outputs, indicating a strong 'token bias' and suggesting that these models are highly sensitive and fragile. Additionally, in tasks requiring the correct selection of multiple tokens, the probability of arriving at an accurate answer decreases exponentially with the number of tokens or steps involved.
Having inherent unreliability in complex reasoning is highly problematic given the deployment of AI technologies, particularly LLMs, across various sectors is becoming ubiquitous. With an increasing number of employees leaning on these tools for the creation of written documents, and leaders making decisions based on these documents, it matters whether LLMs have genuine logical reasoning capabilities or are merely relying on probabilistic pattern-matching.
What may seem like a merely epistemological or academic quibble fundamentally matters because there are far-reaching implications as AI systems become increasingly integrated into decision-making processes in critical fields like medicine, law, and education. If AI systems cannot perform robust formal reasoning, their deployment in areas requiring precision may be ethically untenable.
Epistemology, Formal Reasoning, and LLMs
Epistemology, the branch of philosophy concerned with the nature and scope of knowledge, provides a useful framework for understanding the limitations of LLMs in performing formal reasoning. Classical epistemologists like Immanuel Kant (1724–1804) distinguished between a priori knowledge (knowledge independent of experience) and a posteriori knowledge (knowledge dependent on experience). LLMs, by design, operate on a posteriori principles, learning from vast datasets through pattern recognition. While this allows them to perform well on many tasks, it does not equip them to handle a priori reasoning, which is essential for formal logic and mathematics.