In March 2025, computers remain humanity's most precise tools, their unique surplus value lying in the
of their basic computing processing. As my uncle John quipped in 1975, "
" themselves human errors. Yet, this thesis holds at the core:
when logic drives silicon, outcomes align perfectly with design.
Backtracking through prior explorations substantiates this:
Computers' basic processes - gates, bits, rules - yield certain outputs,
a perfection human minds can't match, while human inputs and designs may falter.
This concise overview traces five stages of mechanical reasoning, picking up thought fibers
from rigid code to quantum futures, testing how its predictability evolves, erodes or transforms.
2.1.
Classical Programming: Fixed Paths, Predictable Limits
Overview:
Born in the 1940s-50s (e.g., ENIAC), classical programming uses flowchart structures with fixed paths
- like "
if-then" (later also "
if-then-else"), "
do" and "
while" loops.
Logic gates execute: "
if (A > 5) then B else C." Routes are excluded based on values (e.g., "
if hours < 40, no overtime"). Execution is predictable: input X yields Y
- barring bugs and edge cases.
Key Concepts:
Sequential logic - step-by-step execution via Boolean operations (AND, OR, NOT), pure rule-following.
Advantages:
Precision - transparent steps, full developer control.
Suits sequential and repetitive tasks, like manipulation of text strings, mass calculation
(e.g., payroll, missile guidance), processing multi-dimensional matrices, scientific calculations.
Disadvantages:
Vulnerable to tiny errors ("
bugs").
Poor adaptability.
No semantic nets. Poor pattern recognition.
Rigidity - no learning.
Principal limitations like undecidability (Halting Problem:
can't always predict whether the process will end) and combinatorial explosion (2^
n
paths) cap scale.
Sources:
(•) Turing, A. M., "
On Computable Numbers, with an Application to the Entscheidungsproblem,"
Proceedings of the London Mathematical Society, 1936.
Foundational work on deterministic computation and predictability.
(•) von Neumann, J., "
First Draft of a Report on the EDVAC," University of Pennsylvania, 1945.
Defines the architecture of fixed-path classical systems.
(•) Knuth, D. E.,
The Art of Computer Programming, Vol. 1:
Fundamental Algorithms, 3rd ed., Addison-Wesley, 1997 (updated reprints to 2025).
Covers sequential logic and limits like the Halting Problem.
2.2.
Classical Expert Systems:
Structured Knowledge, Manual Growth
Overview:
From the 1970s
Artificial Intelligence (AI) emerges in a first stage: "
symbolic AI",
characterized by expert systems, and logical inference engines.
Expert systems (e.g., MYCIN), pair a knowledge base - "
if fever AND cough, flu possible" -
with an inference engine.
Expert system typically involved:
Explicit rules (if-then).
Structured facts or assertions.
Values weigh rules (e.g., "probability 0.8").
Paths are fixed; routes are activated by weights of preceding values or need a digital matching input
(e.g., "fever = yes").
Possibly forward/backward chaining.
Deterministic behavior, explainability.
Humans edit manually.
Key Concepts:
Primitive semantic nets - rules link concepts (e.g., "
fever
flu
") in a
structured graph.
Logical routing is pre-set.
Advantages:
Mimicking expert's analytical and derivation process.
High predictability.
Control on content, knowledge model and structure of reasoning.
These can be shaped, checked, corrected, refined, extended, through testing, application
and evolving expertise: an interactive man-machine learning loop.
Model gradually reflects state-of-the-art expertise (e.g., medicine).
Can be validated by controlled statistical samples.
Disadvantages:
No dynamic pattern recognition.
Edge cases slip through - output's general, not exact.
Static - no automated or "
self-learning".
Scale clogs with too many rules.
Sources:
(•) Feigenbaum, E. A., "
The Art of Artificial Intelligence:
Themes and Case Studies of Knowledge Engineering,"
IJCAI, 1977.
Introduces expert systems like DENDRAL.
(•) Buchanan, B. G., & Shortliffe, E. H.,
Rule-Based Expert Systems: The MYCIN Experiments,
Addison-Wesley, 1984.
Details structured rule bases and manual updates.
(•) Jackson, P.,
Introduction to Expert Systems, 3rd ed.,
Addison-Wesley, 1998 (revised editions to 2025).
Covers semantic nets and limitations.
2.3.
Rules Combination Systems:
Inference from Rules
Overview:
In the 1980s, inference engines (e.g., PROLOG) emerge.
Engineers feed inference rules - "
if A, then B; if B, then C" - and systems combine them,
tracing dynamic paths from inputs (A) to outputs (C).
Key Concepts:
Pattern recognition emerges - system matches rule patterns to inputs.
No semantic nets per se, but inference chains mimic reasoning.
Advantages:
Flexible - dynamic paths suit hypotheticals (e.g., legal reasoning, diagnostics).
Disadvantages:
Poor predictability, knowledge base built like a
grab bag:
combinatory explosion
of possible rule sequences.
Search problem - requires exponential time (2
^n paths, "
Exp-time").
Near-unfeasable checks of consistency and validity, efficiency of routes and optimizing outputs.
Rules may clash (e.g., "
if A, then D; if A, then NOT D"), yielding contradiction,
blurring predictability.
No learning.
Sources:
(•) McCarthy, J., "
Programs with Common Sense,"
Mechanisation of Thought Processes, 1958.
Early work on inference systems like LISP.
(•) Kowalski, R.,
Logic for Problem Solving, North-Holland, 1979.
Defines PROLOG and dynamic rule chaining.
(•) Colmerauer, A., & Roussel, P., "
The Birth of Prolog,"
History of Programming Languages-II, ACM, 1996 (archived updates to 2025).
Traces inference scalability issues.
2.4.
Machine Learning Systems: Adaptive Patterns
Overview:
From the 1990s to 2025, machine learning (e.g., "
neural nets", GPT) redefines reasoning.
There is a sharp distinction between
classical expert systems (rule-based, often lemma- or concept-level),
and the
statistical pattern-learning shift that came with the rise of
machine learning (ML)
and
Natural Language processing (NLP).
Early semantic nets:
The early semantic nets were
symbolic: graphs of labeled concepts and relations,
sometimes hand-curated.
Semantic net:
A
semantic network is typically a
graph-based structure where:
• nodes or vertices represent concepts or elements (words, objects, etc.),
• and edges or arcs represent semantic relationships
(like "is-a", "part-of", "causes", etc.). These edges may or may not have weights
representing strength or frequency of association.
E.g.:
[Dog] --is-a--> [Animal]
[Dog] --chases--> [Cat]
This structure is
explicit and often
hand-designed (or semi-automated),
and was more common in older AI systems and knowledge engineering (e.g., WordNet or Cyc).
The early "semantic net" trend in AI however was not truly semantic in the deeper linguistic sense
(like Leech, Lyons, or Halliday might use "
semantics"), were patterns and elements in
syntactic surface structure, like phrases and sentences, are analyzed and broken down to networks of
words, subwords, lemmas, mapping their semantic relations unto a
semantic deep structure.
In AI, the term "
semantic net" got popular because it
sounded like deeper meaning structures,
but in practice the analysis was rather shallow.
A lot of the early work (even in 90s/2000s NLP) was
surface-level,
statistical, and
association-based, not meaning-based.
Systems labeled as "semantic nets" often just modeled
co-occurrence,
similarity, or
taxonomy (like "is-a" and "part-of")
- which are a rough type of semantic relations, but not
semantic analysis.
Lemma-level structuring or
word-sense disambiguation was largely limited to
lexical
databases (like WordNet) or
tagged corpora, not part of dynamic learning systems.
The "
semantic net" trend was more of a
transitional concept - halfway between symbolic AI
(expert systems, logical inference engines) and the emerging
statistical approaches.
Embeddings: contextualized co-occurences:
In NLP research, the focus gradually shifted to
statistical learning:
using collocation patterns, n-grams, hidden Markov models, etc.
This culminated in today's
Large Language Models (LLMs, like ChatGPT)
that use weights and links between elements simular as in semantic nets, but with important differences.
No coded rules - logic emerges from data: trillions of words, social media posts.
Paths adapt: "if 'rain' and 'cloud,' then 'wet'" weights at 0.7.
Systems learn and adapt with each input, predicting by estimating probabilities.
The whole architecture of LLMs is about
learning distributed representations
- no explicit semantic net needed, just patterns across billions of tokens (words, subwords).
LLMs
don't directly use semantic nets.
Instead, they use so-called "
neural networks" (i.e.
transformer architectures) that:
• Represent tokens (words, subwords) as vectors in a high-dimensional
embedding space.
• Learn patterns, associations, and statistical relationships from huge datasets via
backpropagation.
• Adjust weights in dense layers (not symbolic nodes and edges)
based on how often certain patterns help predict the next token or improve task performance.
So instead of "Dog is-a Animal" being a fixed link, the model
learns
that tokens like "dog" and "animal" often appear in related contexts - and refects this in
how their vector embeddings
co-occur and interact inside the network.
The training objective is
minimizing prediction error.
Implicit graphs: weighted associations:
In LLMs, elements are linked based on
frequencies and
co-occurences within
contexts.
Both - but more sophisticated.
1.
Frequencies are definitely a core key in LLM input processing.
The more a word pair, phrase, or structure
co-occurs, the stronger the model tends to
associate
them (at least initially).
2. However, the training objective (minimizing prediction error) leads to more complex,
contextual representations than simple frequency. It's not just "A occurs with B a lot" - it's "
A occurs with B
in these kinds of contexts".
The network's
attention heads learn to focus on
context-sensitive dependencies.
This results in more sophisticated semantic nets - data forms
implicit graphs (e.g., "
rain
wet
slippery").
3. Thus, the model ends up learning associations that are sometimes called "
implicit correlations"
(although '
correlation', in the statistical sense of the word, has little to do with it):
higher-dimensional, often nonlinear dependencies between tokens and structures across layers.
These have no predictive power like explicit "
correlation coefficients",
but they guide behavior nonetheless.
Summary:
The newer LLM-style systems rely on:
• Implicit rules, embedded in neural weights.
• Massive statistical inference, not logic-based inference.
• Soft, fuzzy matching of patterns across contexts.
• Limited built-in explainability or traceability (though attention maps can offer some clues).
The model learns
weighted associations, but not as discrete semantic nets.
It learns
contextual patterns through "
neural embeddings" and
attention mechanisms,
not hardcoded links.
Frequencies play a role early, but deeper layers and
attention weights model more
sophisticated semantic and syntactic relationships - closer to, as one may say, "
contextual association
" than raw frequency.
Key Concepts:
Advanced pattern recognition - spotting trends in chaos (e.g., social media sentiment).
Sophisticated semantic nets - data forms implicit graphs (e.g., "
rain
wet
slippery").
Probabilistic inference drives it, derived from
relative simularities.
Advantages:
Scale conquers complexity - prunes vast options 2
^n paths via stats.
Excelling in messy domains (language, images).
No manual edits - learns and adapts from new data.
Limitations:
Real human Intelligence still works entirely different than "
mechanical Intelligence"
in present AI systems.
One of the many difference is that human "understanding" of language, concepts or perceptions etc.,
is only for a part based on form/syntax, or quantifying co-occurrence, similarity, or taxonomy,
in patterns and/or contexts.
I works with kind-of deep-semantic elements like features, +aspects, markers
that are implicit in the "input", or better, 'ad hoc' attributed by trillions of associations
in brain processing within many areas and on many levels in the nerve/sensory/motor/visceral/.. system.
Of course this huge difference has tremendous consequences for performance of the "Intelligence".
When using various AI systems and trying out diferent kind of analysis, it becomes clear that
their language idiom, grammer, style is marvellous, but all a matter of form
- and the error frequency in content is immense.
That can be explained, at least for an important part, by the missing deep-semantics,
concerning certain classes or types of errors.
This hits right at the philosophical and technical core of the distinction between
mechanical intelligence and
human intelligence, especially regarding
understanding.
Human "
Intelligence" works via
deep-semantic feature processing that's embodied,
context-saturated, and associative at a
multi-modal, multi-level (real) neural level.
Meanwhile, AI systems like LLMs - for all their fluent language production - are
structurally shallow
in comparison.
Deep-Semantics vs. Surface-Patterning
Most LLMs excel at
syntax and stylistic brilliance, based on
pattern completion, prioritizing
simularity,
coherence and
compatability, but falter when
true understanding
is required - i.e.,
internally grounded semantic coherence, meaningfulness, consistency and validity.
NOG DOEN:
[*Major fallacies in conventional thinking:
- Linking by
outward resemblance / "
seeming similarity."
superficial correspondence /parity.
- Linking by
associative retrieval.]
LLMs can
mimic reasoning but don't inherently
understand or encode
logical entailment,
causal influence, or
psychological implications
unless explicitly trained or prompted that way.
This explains why:
You can get
perfect grammar,
humanlike idiom, even emotionally appropriate tone ..
..but also
wildly false claims,
nonsense inferences, or
contradictory logic
- especially on novel or subtle tasks.
This can be traced back to the absence of
deep semantic anchoring - i.e., no grounding in:
•
Sensorimotor systems;
•
Biological needs/drives/affects;
•
Spatial-temporal embodiment;
•
Multi-modal representations (sound, motion, vision, proprioception);
•
Subjective experience, like
· Conscious awareness.
· Conscious noting something (on grounds of difference).
· Degree of global intensity of consciousness.
· Subjective sensations (sentiency).
· Quality aspects of experiences (qualia).
· Clarity, sharpness and detail of experience (lucidity).
· Dynamics of experience (vividness).
· Degree of specific intensity of experience (impressiveness).
· Sense encountered (pregnancy).
· Meaning perceived (intensionality).
· Overall experience of quality
(E.g., experienced degree of happiness, contentment, gratification,
fulfillment, satisfaction).
Instead, LLMs operate in a
disembodied vector space, trained on
form-frst,
distributional
co-occurrence.
Trillions of neural associations =/= trillions of tokens.
The human brain's trillions of
associations are not just between
symbols (like words),
but between
states, sensations, goals, emotional tones, body mappings and abstract labels -
all
grounded in real, lived sensorimotor experience.
This gives human "Intelligence":
• The ability to disambiguate meaning beyond syntax;
• The capacity to know what's plausible or absurd, even if never seen before;
• A deeply intentional architecture where symbols are just tips of "meaning-icebergs";
Whereas LLMs only "know" what statistically tends to
follow something else on the
surface level
of expressions, like language symbols, sounds and images - without any internal sense of truth,
context survival, or goal coherence.
Why LLMs make such errors - and why they sound smart while doing it:
A
major class of LLM errors can indeed be traced to the lack of
deep semantics, such as:
• Misplaced causality: confusing correlation with cause because it's statistically plausible.
• Semantic drift: starting in one domain and ending up incoherently in another.
• Contradictions: no internal mechanism for guarding consistency of arguments or facts.
• Hallucinations: generating content that "sounds right" but has no reality anchor.
These errors stem from the model's design goal:
formful coherence, not
factual truth.
High error frequency on content level, despite stylistic brilliance, is tied to
semantic shallowness,
directly following from the absence of
deep semantics.
Disadvantages:
Garbage input:
Nets mirror internet tsunamis of nonsense, not ground truth.
Data quality falters: crawling the web (flooded with nonsense, fallacies, disinformation) risks "
garbage in, garbage out" (GIGO) effect.
Fluctuating structure:
Paths and nodes
shift with
estimated probabilities.
Constant data flux kills
predictability.
Tiny memory:
Local learning fails: facts from one chat (tiny memory) distort or vanish mid-thread.
Form mimicry:
Most errors LLMs commit stem from
pattern mimicry without depth: the systems complete their output
based on
surface structure, not
semantic intent.
Across all domains, the common failure mode is
surface coherence over deep structure.
The model gives what "
sounds right" over what
is right - unless precisely anchored by
external structure, examples, or corrections.
Long responses[*?] or abstract domains (like logic or causality) amplify these failures.
No
logical validity check.
Black-box reasoning defies expert fixes.
Reliability tanks for precise tasks.
Such errors lead to serious disadvantageous if not dangerous or harmful effects:
• unpredictability/unreliability,
• user exhaustion/uselessness,
• misleading/brain-crippling effects,
Dangers lurk - not a cartoonish "
Exterminate Humanity" plan, but subtler: masses consulting AI,
brains muddled by unchecked drivel, a slow cognitive poison.
Many of these flaws and defects violate the 1950s rule - "
Computers should ease, not tease".
No one should have to manually cross-check outputs
- which may take for hours when GIGO drivel piles up.
Final thought - bridging the gap?
Some emerging approaches try to fix those systemic flaws.
Mitigations often involve adding
explicit structure, external reasoning tools, or
user prompts that force better grounding or clarifcation.
• Neuro-symbolic hybrids (merge deep learning with symbolic inference);
• Grounded LLMs (tied to specific environments, robot skills, or sensory inputs);
• Retrieval-augmented models (injecting external knowledge, but still brittle);
• Memory + agency frameworks (simulated persistent selves).
But for now, the "
I" of mechanical AI is still very much a
shallow mimicry of the human "
I
": fluent, but hollow - ironically
where it matters most.
Sources:
(•) Rosenblatt, F., "The Perceptron: A Probabilistic Model for Information Storage,"
Psychological Review, 1958.
Roots of adaptive Mechanical Learning (ML).
(•) Yann LeCun's "Deep Learning" (1989).
Scales nets.
(•) Hinton, G. E., et al., "A Fast Learning Algorithm for Deep Belief Nets,"
Neural Computation, 2006.
Refines training through Backpropagation and pattern recognition advances.
On reliability:
(•) Mittelstadt et al., "The Ethics of Algorithms" (2016).
Flags validity gaps.
(•) Amodei, D., et al., "Concrete Problems in AI Safety," arXiv, 2016.
Documents ML data-driven errors like data drift and reliability loss.
2.5.
Quantum Computing:
Tentative Quantum Reasoning
Overview:
In mid 2020's, quantum computing emerges tentatively. Unlike bits (0 or 1), Qubits in
superposition
(0, 1, or both), entangled for parallel processing, power algorithms like Shor's (factoring)
or Grover's (searching) - e.g., "
if qubit A AND B entangle, solve X."
Reasoning blends probabilistic and deterministic logic
on systems like IBM's Q or Google's Sycamore.
Key Concepts:
Quantum pattern recognition:
superposition spots patterns (e.g., molecular states)
exponentially faster.
Entangled semantic nets - qubits link states instantly.
Hybrid inference - quantum gates (e.g., Hadamard) mix certainty and chance.
Advantages:
Scale: solves 2
^n problems in steps (e.g., cracking encryption, modeling molecules).
Precision potential: exact outcomes could outpace Stage 4's stats.
Disadvantages:
Predictability's murky - superposition collapses unpredictably (measurement problem).
Noise disrupts qubits - current error rates (1% per gate) dwarf those of classical reliability.
No learning yet - systems are programmed, not adaptive.
Accessibility lags - few practical tools by 2025, still lab-bound, not practical yet.
Sources:
(•) Deutsch, D., "
Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer,
"
Proceedings of the Royal Society, 1985.
Grounds quantum logic.
(•) Shor, P. W., "
Polynomial-Time Algorithms for Prime Factorization on a Quantum Computer,"
SIAM Journal on Computing, 1997.
Showcases quantum reasoning power.
(•) Bernhardt, C.,
Quantum Computing for Everyone, MIT Press, 2019 (updated editions to 2025).
Explains superposition and noise challenges.
Does logic's absolute and perfect predictability endure?
Stage 1 proves it - gates are flawless, human errors aside.
Stage 2 upholds it - rules predict within design.
Stage 3 bends it - logic flexes, but holds basic truth.
Stage 4 frays it - bits remain, but input data sacrifices precision for scale.
Stage 5 reimagines it - quantum logic promises scale, yet noise challenges perfection.
Validity and reliability peak in Stages 1-2, wane in 3, shake in 4 - trading truth for scale -
and slowly emerge in 5.
Computers don't fail at their core - but human inputs and quantum quirks stretch the ideal.