Every science makes claims about internal mechanism — this circuit implements induction, this brain region encodes reward, this pathway mediates drug response, this interaction drives the epidemic. These claims are usually supported by indirect evidence: an intervention changes the output, a statistical association holds after adjustment, a probe classifier achieves high accuracy. Necessary, but not sufficient. Knowing that something matters is different from knowing how it works.
Our research program develops methods for testing mechanistic claims directly — and then takes those methods into specific scientific domains to see if they hold up.
The core thesis
Two things make this program distinct from standard methodology work:
1. Validation criteria from philosophy of science. The mechanistic validity program is a four-paper philosophical chain — Views (what is a mechanism?), Validity (is the claim warranted?), Reference (does the term refer across systems?), Knowledge (do we genuinely know it?) — plus a companion on cross-view invariance. Most published mechanistic claims achieve descriptive adequacy and causal sufficiency but skip construct validity entirely. The invariance depth of the IOI circuit — the field’s most studied case — reveals internal inconsistency in its support set.
2. Geometry as a causal discovery tool. The mathematical backbone is geometric causal discovery: Grassmannian boundaries for causal variables, sheaf cohomology for global consistency, curvature for locating latent structure, and bracket norm for measurement-invariant importance. These are coordinate-free methods — they don’t depend on the parameterization of the system, which is why they transfer across domains.
Why multiple domains
The portability of the methodology is itself a testable claim. If geometric causal discovery only works in neural networks, it’s a neural-networks technique, not a general methodology. Each domain is a test:
- Neural networks — the original domain. Grokking and the Grassmannian boundary, structured causal variables, cross-task transfer.
- Neuroscience — brain-wide Neuropixels recordings. Bracket norm, dimensionality-mediated dissociations, causal subspaces.
- Clinical epidemiology — multiple sclerosis and Alzheimer’s disease. Sheaf cohomology, curvature features, boundary conditions for geometric methods.
Ongoing and planned: chemistry (reaction pathways and molecular representations), drug discovery (target identification), neuroendocrinology (hormonal feedback loops), physics (causal structure in physical systems), and public health (population-level intervention data).
Open questions
These are the things we don’t know yet and are working toward:
Where does geometric causal discovery fail? Every method has boundary conditions. The clinical epidemiology paper maps some of them (dimensionality of confounding, nonlinearity of causal pathways). We need failure-mode maps for every domain.
Can construct validity be automated? Right now, evaluating whether an operationalization measures what it claims to requires domain expertise and careful argument. Is there a computable version — a test statistic for construct validity?
Does the Grassmannian boundary generalize beyond grokking? The sharp partition between linear and nonlinear causal variables appears in modular arithmetic. Does it appear in protein folding? In neural dynamics? In clinical trajectories?
What’s the minimal evidence for a mechanistic claim? The five-layer taxonomy says what’s ideal. In practice, resources are finite. What’s the smallest set of tests that distinguishes a genuine mechanism from a plausible-but-wrong story?