Simulating-Experimental-Design-for-N-1-Causal-Analysis

**Method 'Practical Logic'**

Principles of Information

The Evolution of Mechanical Reasoning in Five Stages

C.P. van der Velde.

[First website version 02-04-2025]

1. Introduction: The Predictability Thesis

In March 2025, computers remain humanity's most precise tools, their unique surplus value lying in the absolute perfect predictability of their basic computing processing. As my uncle John quipped in 1975, " Computers never fail, humans do" - a claim I once disputed, pinning flaws on " bad programming concepts," themselves human errors. Yet, this thesis holds at the core: when logic drives silicon, outcomes align perfectly with design.

Backtracking through prior explorations substantiates this:

(·) How Computers Rely on Elementary Logic: A Simple Guide.
Shows elementary logic gates delivering "(1 AND 0) = 0" without fail, a bedrock of certainty.
(·) Why Math Needs Logic, But Logic Doesn't Need Math
Proves logic as math's unshakable root - "if A, then B" needs no numbers.
(·) Logic, Computers, Brains, and Language: The Power to Predict
Ties timeless logic to its implementations in physical instances: machines and organs.
(·) Quantum Computers: What's the Big Deal?
Roots quantum leaps in logical frames.
(·) AI's Evolution in the mid 2020's
Shows AI bit-level roots,
(·) Simulating Experimental Design for N=1 Causal Analysis
Traces causality to logical steps.

Computers' basic processes - gates, bits, rules - yield certain outputs, a perfection human minds can't match, while human inputs and designs may falter.

2. Mechanical Reasoning Evolution in 5 Stages

This concise overview traces five stages of mechanical reasoning, picking up thought fibers from rigid code to quantum futures, testing how its predictability evolves, erodes or transforms.

2.1. Classical Programming: Fixed Paths, Predictable Limits

Overview:

Born in the 1940s-50s (e.g., ENIAC), classical programming uses flowchart structures with fixed paths - like "if-then" (later also "if-then-else"), "do" and "while" loops.
Logic gates execute: "if (A > 5) then B else C." Routes are excluded based on values (e.g., " if hours < 40, no overtime"). Execution is predictable: input X yields Y - barring bugs and edge cases.

Key Concepts:

Sequential logic - step-by-step execution via Boolean operations (AND, OR, NOT), pure rule-following.

Advantages:

Precision - transparent steps, full developer control.
Suits sequential and repetitive tasks, like manipulation of text strings, mass calculation (e.g., payroll, missile guidance), processing multi-dimensional matrices, scientific calculations.

Disadvantages:

Vulnerable to tiny errors ("bugs").
Poor adaptability.
No semantic nets. Poor pattern recognition.
Rigidity - no learning.
Principal limitations like undecidability (Halting Problem: can't always predict whether the process will end) and combinatorial explosion (2^n paths) cap scale.

Sources:

(•) Turing, A. M., "On Computable Numbers, with an Application to the Entscheidungsproblem," Proceedings of the London Mathematical Society, 1936.
Foundational work on deterministic computation and predictability.
(•) von Neumann, J., "First Draft of a Report on the EDVAC," University of Pennsylvania, 1945.
Defines the architecture of fixed-path classical systems.
(•) Knuth, D. E., The Art of Computer Programming, Vol. 1:
Fundamental Algorithms, 3rd ed., Addison-Wesley, 1997 (updated reprints to 2025).
Covers sequential logic and limits like the Halting Problem.

2.2. Classical Expert Systems: Structured Knowledge, Manual Growth

Overview:

From the 1970s Artificial Intelligence (AI) emerges in a first stage: "symbolic AI", characterized by expert systems, and logical inference engines.
Expert systems (e.g., MYCIN), pair a knowledge base - "if fever AND cough, flu possible" - with an inference engine.
Expert system typically involved:

Explicit rules (if-then).
Structured facts or assertions.
Values weigh rules (e.g., "probability 0.8").
Paths are fixed; routes are activated by weights of preceding values or need a digital matching input (e.g., "fever = yes").
Possibly forward/backward chaining.
Deterministic behavior, explainability.
Humans edit manually.

Key Concepts:

Primitive semantic nets - rules link concepts (e.g., "fever flu ") in a structured graph.
Logical routing is pre-set.

Advantages:

Mimicking expert's analytical and derivation process.
High predictability.
Control on content, knowledge model and structure of reasoning. These can be shaped, checked, corrected, refined, extended, through testing, application and evolving expertise: an interactive man-machine learning loop.
Model gradually reflects state-of-the-art expertise (e.g., medicine). Can be validated by controlled statistical samples.

Disadvantages:

No dynamic pattern recognition.
Edge cases slip through - output's general, not exact.
Static - no automated or "self-learning".
Scale clogs with too many rules.

Sources:

(•) Feigenbaum, E. A., "The Art of Artificial Intelligence: Themes and Case Studies of Knowledge Engineering," IJCAI, 1977.
Introduces expert systems like DENDRAL.
(•) Buchanan, B. G., & Shortliffe, E. H., Rule-Based Expert Systems: The MYCIN Experiments, Addison-Wesley, 1984.
Details structured rule bases and manual updates.
(•) Jackson, P., Introduction to Expert Systems, 3rd ed., Addison-Wesley, 1998 (revised editions to 2025).
Covers semantic nets and limitations.

2.3. Rules Combination Systems: Inference from Rules

Overview:

In the 1980s, inference engines (e.g., PROLOG) emerge.
Engineers feed inference rules - "if A, then B; if B, then C" - and systems combine them, tracing dynamic paths from inputs (A) to outputs (C).

Key Concepts:

Pattern recognition emerges - system matches rule patterns to inputs.
No semantic nets per se, but inference chains mimic reasoning.

Advantages:

Flexible - dynamic paths suit hypotheticals (e.g., legal reasoning, diagnostics).

Disadvantages:

Poor predictability, knowledge base built like a grab bag: combinatory explosion of possible rule sequences.
Search problem - requires exponential time (2^{^n} paths, "Exp-time"). Near-unfeasable checks of consistency and validity, efficiency of routes and optimizing outputs.
Rules may clash (e.g., "if A, then D; if A, then NOT D"), yielding contradiction, blurring predictability.
No learning.

Sources:

(•) McCarthy, J., "Programs with Common Sense," Mechanisation of Thought Processes, 1958.
Early work on inference systems like LISP.
(•) Kowalski, R., Logic for Problem Solving, North-Holland, 1979.
Defines PROLOG and dynamic rule chaining.
(•) Colmerauer, A., & Roussel, P., "The Birth of Prolog," History of Programming Languages-II, ACM, 1996 (archived updates to 2025).
Traces inference scalability issues.

2.4. Machine Learning Systems: Adaptive Patterns

Overview:

From the 1990s to 2025, machine learning (e.g., "neural nets", GPT) redefines reasoning.
There is a sharp distinction between
classical expert systems (rule-based, often lemma- or concept-level),
and the statistical pattern-learning shift that came with the rise of machine learning (ML) and Natural Language processing (NLP).

Early semantic nets:

The early semantic nets were symbolic: graphs of labeled concepts and relations, sometimes hand-curated.

Semantic net:

A semantic network is typically a graph-based structure where:

• nodes or vertices represent concepts or elements (words, objects, etc.),
• and edges or arcs represent semantic relationships (like "is-a", "part-of", "causes", etc.). These edges may or may not have weights representing strength or frequency of association.

E.g.:
[Dog] --is-a--> [Animal]
[Dog] --chases--> [Cat]

This structure is explicit and often hand-designed (or semi-automated), and was more common in older AI systems and knowledge engineering (e.g., WordNet or Cyc).

The early "semantic net" trend in AI however was not truly semantic in the deeper linguistic sense (like Leech, Lyons, or Halliday might use "semantics"), were patterns and elements in syntactic surface structure, like phrases and sentences, are analyzed and broken down to networks of words, subwords, lemmas, mapping their semantic relations unto a semantic deep structure.

In AI, the term "semantic net" got popular because it sounded like deeper meaning structures, but in practice the analysis was rather shallow.
A lot of the early work (even in 90s/2000s NLP) was surface-level, statistical, and association-based, not meaning-based.
Systems labeled as "semantic nets" often just modeled
co-occurrence, similarity, or taxonomy (like "is-a" and "part-of") - which are a rough type of semantic relations, but not semantic analysis.
Lemma-level structuring or word-sense disambiguation was largely limited to lexical databases (like WordNet) or tagged corpora, not part of dynamic learning systems.
The "semantic net" trend was more of a transitional concept - halfway between symbolic AI (expert systems, logical inference engines) and the emerging statistical approaches.

Embeddings: contextualized co-occurences:

In NLP research, the focus gradually shifted to statistical learning:
using collocation patterns, n-grams, hidden Markov models, etc.
This culminated in today's Large Language Models (LLMs, like ChatGPT) that use weights and links between elements simular as in semantic nets, but with important differences.

No coded rules - logic emerges from data: trillions of words, social media posts.
Paths adapt: "if 'rain' and 'cloud,' then 'wet'" weights at 0.7.
Systems learn and adapt with each input, predicting by estimating probabilities.

The whole architecture of LLMs is about learning distributed representations - no explicit semantic net needed, just patterns across billions of tokens (words, subwords).

LLMs don't directly use semantic nets.
Instead, they use so-called "neural networks" (i.e. transformer architectures) that:

• Represent tokens (words, subwords) as vectors in a high-dimensional embedding space.
• Learn patterns, associations, and statistical relationships from huge datasets via backpropagation.
• Adjust weights in dense layers (not symbolic nodes and edges) based on how often certain patterns help predict the next token or improve task performance.

So instead of "Dog is-a Animal" being a fixed link, the model learns that tokens like "dog" and "animal" often appear in related contexts - and refects this in how their vector embeddings co-occur and interact inside the network.
The training objective is minimizing prediction error.

Implicit graphs: weighted associations:

In LLMs, elements are linked based on frequencies and co-occurences within contexts.
Both - but more sophisticated.

1. Frequencies are definitely a core key in LLM input processing. The more a word pair, phrase, or structure co-occurs, the stronger the model tends to associate them (at least initially).
2. However, the training objective (minimizing prediction error) leads to more complex, contextual representations than simple frequency. It's not just "A occurs with B a lot" - it's " A occurs with B in these kinds of contexts".
The network's attention heads learn to focus on context-sensitive dependencies.
This results in more sophisticated semantic nets - data forms implicit graphs (e.g., "rain wet slippery").
3. Thus, the model ends up learning associations that are sometimes called "implicit correlations" (although 'correlation', in the statistical sense of the word, has little to do with it): higher-dimensional, often nonlinear dependencies between tokens and structures across layers.
These have no predictive power like explicit "correlation coefficients", but they guide behavior nonetheless.

Summary:

The newer LLM-style systems rely on:

• Implicit rules, embedded in neural weights.
• Massive statistical inference, not logic-based inference.
• Soft, fuzzy matching of patterns across contexts.
• Limited built-in explainability or traceability (though attention maps can offer some clues).

The model learns weighted associations, but not as discrete semantic nets.
It learns contextual patterns through "neural embeddings" and attention mechanisms, not hardcoded links.
Frequencies play a role early, but deeper layers and attention weights model more sophisticated semantic and syntactic relationships - closer to, as one may say, "contextual association " than raw frequency.

Key Concepts:

Advanced pattern recognition - spotting trends in chaos (e.g., social media sentiment).
Sophisticated semantic nets - data forms implicit graphs (e.g., "rain wet slippery").
Probabilistic inference drives it, derived from relative simularities.

Advantages:

Scale conquers complexity - prunes vast options 2^{^n} paths via stats.
Excelling in messy domains (language, images).
No manual edits - learns and adapts from new data.

Limitations:

Real human Intelligence still works entirely different than "mechanical Intelligence" in present AI systems.
One of the many difference is that human "understanding" of language, concepts or perceptions etc., is only for a part based on form/syntax, or quantifying co-occurrence, similarity, or taxonomy, in patterns and/or contexts.
I works with kind-of deep-semantic elements like features, +aspects, markers that are implicit in the "input", or better, 'ad hoc' attributed by trillions of associations in brain processing within many areas and on many levels in the nerve/sensory/motor/visceral/.. system.
Of course this huge difference has tremendous consequences for performance of the "Intelligence".

When using various AI systems and trying out diferent kind of analysis, it becomes clear that their language idiom, grammer, style is marvellous, but all a matter of form - and the error frequency in content is immense.

That can be explained, at least for an important part, by the missing deep-semantics, concerning certain classes or types of errors.

This hits right at the philosophical and technical core of the distinction between mechanical intelligence and human intelligence, especially regarding understanding.
Human "Intelligence" works via deep-semantic feature processing that's embodied, context-saturated, and associative at a multi-modal, multi-level (real) neural level.
Meanwhile, AI systems like LLMs - for all their fluent language production - are structurally shallow in comparison.

Deep-Semantics vs. Surface-Patterning

Most LLMs excel at syntax and stylistic brilliance, based on pattern completion, prioritizing simularity, coherence and compatability, but falter when true understanding is required - i.e., internally grounded semantic coherence, meaningfulness, consistency and validity.
NOG DOEN:
[*Major fallacies in conventional thinking:
- Linking by outward resemblance / "seeming similarity."
superficial correspondence /parity.
- Linking by associative retrieval.]

LLMs can mimic reasoning but don't inherently understand or encode logical entailment, causal influence, or psychological implications unless explicitly trained or prompted that way.

This explains why:
You can get perfect grammar, humanlike idiom, even emotionally appropriate tone ..
..but also wildly false claims, nonsense inferences, or contradictory logic - especially on novel or subtle tasks.

This can be traced back to the absence of deep semantic anchoring - i.e., no grounding in:

• Sensorimotor systems;
• Biological needs/drives/affects;
• Spatial-temporal embodiment;
• Multi-modal representations (sound, motion, vision, proprioception);
• Subjective experience, like

· Conscious awareness.
· Conscious noting something (on grounds of difference).
· Degree of global intensity of consciousness.
· Subjective sensations (sentiency).
· Quality aspects of experiences (qualia).
· Clarity, sharpness and detail of experience (lucidity).
· Dynamics of experience (vividness).
· Degree of specific intensity of experience (impressiveness).
· Sense encountered (pregnancy).
· Meaning perceived (intensionality).
· Overall experience of quality (E.g., experienced degree of happiness, contentment, gratification, fulfillment, satisfaction).

Instead, LLMs operate in a disembodied vector space, trained on form-frst, distributional co-occurrence.

Trillions of neural associations =/= trillions of tokens.

The human brain's trillions of associations are not just between symbols (like words), but between states, sensations, goals, emotional tones, body mappings and abstract labels - all grounded in real, lived sensorimotor experience.
This gives human "Intelligence":

• The ability to disambiguate meaning beyond syntax;
• The capacity to know what's plausible or absurd, even if never seen before;
• A deeply intentional architecture where symbols are just tips of "meaning-icebergs";

Whereas LLMs only "know" what statistically tends to follow something else on the surface level of expressions, like language symbols, sounds and images - without any internal sense of truth, context survival, or goal coherence.

Why LLMs make such errors - and why they sound smart while doing it:
A major class of LLM errors can indeed be traced to the lack of deep semantics, such as:

• Misplaced causality: confusing correlation with cause because it's statistically plausible.
• Semantic drift: starting in one domain and ending up incoherently in another.
• Contradictions: no internal mechanism for guarding consistency of arguments or facts.
• Hallucinations: generating content that "sounds right" but has no reality anchor.

These errors stem from the model's design goal: formful coherence, not factual truth.
High error frequency on content level, despite stylistic brilliance, is tied to semantic shallowness, directly following from the absence of deep semantics.

Disadvantages:

Garbage input:

Nets mirror internet tsunamis of nonsense, not ground truth.
Data quality falters: crawling the web (flooded with nonsense, fallacies, disinformation) risks " garbage in, garbage out" (GIGO) effect.

Fluctuating structure:

Paths and nodes shift with estimated probabilities.
Constant data flux kills predictability.

Tiny memory:

Local learning fails: facts from one chat (tiny memory) distort or vanish mid-thread.

Form mimicry:

Most errors LLMs commit stem from pattern mimicry without depth: the systems complete their output based on surface structure, not semantic intent.
Across all domains, the common failure mode is surface coherence over deep structure. The model gives what "sounds right" over what is right - unless precisely anchored by external structure, examples, or corrections.
Long responses[*?] or abstract domains (like logic or causality) amplify these failures.
No logical validity check.
Black-box reasoning defies expert fixes.
Reliability tanks for precise tasks.

Such errors lead to serious disadvantageous if not dangerous or harmful effects:

• unpredictability/unreliability,
• user exhaustion/uselessness,
• misleading/brain-crippling effects,

Dangers lurk - not a cartoonish "Exterminate Humanity" plan, but subtler: masses consulting AI, brains muddled by unchecked drivel, a slow cognitive poison.

Many of these flaws and defects violate the 1950s rule - "Computers should ease, not tease". No one should have to manually cross-check outputs - which may take for hours when GIGO drivel piles up.

Final thought - bridging the gap?

Some emerging approaches try to fix those systemic flaws.
Mitigations often involve adding explicit structure, external reasoning tools, or user prompts that force better grounding or clarifcation.

• Neuro-symbolic hybrids (merge deep learning with symbolic inference);
• Grounded LLMs (tied to specific environments, robot skills, or sensory inputs);
• Retrieval-augmented models (injecting external knowledge, but still brittle);
• Memory + agency frameworks (simulated persistent selves).

But for now, the "I" of mechanical AI is still very much a shallow mimicry of the human "I ": fluent, but hollow - ironically where it matters most.

Sources:

(•) Rosenblatt, F., "The Perceptron: A Probabilistic Model for Information Storage," Psychological Review, 1958.
Roots of adaptive Mechanical Learning (ML).
(•) Yann LeCun's "Deep Learning" (1989).
Scales nets.
(•) Hinton, G. E., et al., "A Fast Learning Algorithm for Deep Belief Nets," Neural Computation, 2006.
Refines training through Backpropagation and pattern recognition advances.

On reliability:
(•) Mittelstadt et al., "The Ethics of Algorithms" (2016).
Flags validity gaps.
(•) Amodei, D., et al., "Concrete Problems in AI Safety," arXiv, 2016.
Documents ML data-driven errors like data drift and reliability loss.

2.5. Quantum Computing: Tentative Quantum Reasoning

Overview:

In mid 2020's, quantum computing emerges tentatively. Unlike bits (0 or 1), Qubits in superposition (0, 1, or both), entangled for parallel processing, power algorithms like Shor's (factoring) or Grover's (searching) - e.g., "if qubit A AND B entangle, solve X." Reasoning blends probabilistic and deterministic logic on systems like IBM's Q or Google's Sycamore.

Key Concepts:

Quantum pattern recognition: superposition spots patterns (e.g., molecular states) exponentially faster.
Entangled semantic nets - qubits link states instantly.
Hybrid inference - quantum gates (e.g., Hadamard) mix certainty and chance.

Advantages:

Scale: solves 2^{^n} problems in steps (e.g., cracking encryption, modeling molecules).
Precision potential: exact outcomes could outpace Stage 4's stats.

Disadvantages:

Predictability's murky - superposition collapses unpredictably (measurement problem).
Noise disrupts qubits - current error rates (1% per gate) dwarf those of classical reliability.
No learning yet - systems are programmed, not adaptive.
Accessibility lags - few practical tools by 2025, still lab-bound, not practical yet.

Sources:

(•) Deutsch, D., "Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer, " Proceedings of the Royal Society, 1985.
Grounds quantum logic.
(•) Shor, P. W., "Polynomial-Time Algorithms for Prime Factorization on a Quantum Computer," SIAM Journal on Computing, 1997.
Showcases quantum reasoning power.
(•) Bernhardt, C., Quantum Computing for Everyone, MIT Press, 2019 (updated editions to 2025).
Explains superposition and noise challenges.

3. Propositions on Validity and Reliability of System's Performance

Validity and Reliability

A. Validity of Reasoning.

Validity requires truth-preserving rules under any input values:
For reasoning to be valid, rules must consistently preserve truth across all inputs - e.g., "if A B, and A, then B" holds firm regardless of context or data.

B. Reliability of Results.

Reliability requires predictability of outcomes:
For results to be reliable, the system must produce consistent, predictable outputs for given inputs - e.g., same conditions yield same answers, not varying whims.

4. Comparing the Stages

Stage 1: Sequential logic.

Precise, rigid, small: 'Precise clerks'.
A: Rules (e.g., "1 AND 0 =0") preserve truth - validity holds.
B: Outputs are fixed - reliability shines, barring human bugs.

Stage 2: Semantic nets.

Controlled, static, niche: 'Controlled librarians'.
A: Fixed rules (e.g., "fever ? flu") stay true - validity persists.
B: Predictable within bounds - reliable, though edge cases test it.

Stage 3: Pattern recognition.

Flexible, search-limited. 'Flexible solvers'.
A: Dynamic rules can clash (e.g., "A D" vs. "A NOT D") - validity wavers.
B: Search flux blurs outputs - reliability weakens.

Stage 4: Advanced nets.

Vast, unstable, unreliable, adaptive: all things considered rather 'Unstable Sketchers', 'Presumptuous Chatterboxes', or 'Unreliable Oracles'.
A: Data shifts rules (e.g., "rain wet" to "dry") - truth-preservation fails; validity's impossible.
Stage 4's data shifts rules (node weights, arcs) with skewed input. E.g., "rain wet" flips to "dry". Truth-preservation fails; validity's impossible.
B: Flux of data, rules, estimated probabilities kills consistency - like "wet " at 0.7 today, 0.3 tomorrow. Predictability dies; reliability's impossible.

Stage 5: Quantum logic.

'Tentative pioneers'.
A: Quantum rules (e.g., entanglement) aim for truth, but noise disrupts - validity's still shaky.
B: Collapse and errors defy prediction - reliability's uncertain in 2025.

5. Conclusion: Revisiting the Thesis

Does logic's absolute and perfect predictability endure?

Stage 1 proves it - gates are flawless, human errors aside.
Stage 2 upholds it - rules predict within design.
Stage 3 bends it - logic flexes, but holds basic truth.
Stage 4 frays it - bits remain, but input data sacrifices precision for scale.
Stage 5 reimagines it - quantum logic promises scale, yet noise challenges perfection.

Validity and reliability peak in Stages 1-2, wane in 3, shake in 4 - trading truth for scale - and slowly emerge in 5.
Computers don't fail at their core - but human inputs and quantum quirks stretch the ideal.