The CoT Backtest: 20 Years of Historical Positioning Behavior

A rigorous 20-year audit of how seven markets behaved historically when CFTC Commitment of Traders positioning sat at similar conditions. Base-rate framed, per-market horizons, market character profiles — descriptive past, not future predictions.

7 markets

currently tracked

NDX, BTC, Gold, Crude, 10Y Treasury, EUR, VIX — narrowed from 10 after audit

20 years

of CFTC history

2006 → 2026, ~10,000 weekly observations

Per-market horizons

4w, 12w, or 26w

Crude/Bitcoin 4w; 10Y Treasury 26w; rest 12w — chosen against each asset's natural cycle

Base-rate framed

every read

Lift vs the no-positioning-info prior, not raw hit rate

Before You Read Any Further

This page is educational research. It describes what the underlying markets did historically when CFTC positioning sat at similar conditions — it does not predict what they will do next time. The dashboard surfaces the same descriptive past in plain language; we never use predictive verbs like 'expected higher' or 'will continue.' Past behaviour does not guarantee or reliably predict future behaviour. Every backtest carries residual survivorship, look-ahead, and selection risk even after controls. Nothing on this page is a recommendation to buy, sell, or hold any security. If you act on positioning data, do so at your own risk and consult a licensed financial advisor first.

Why We Ran This Backtest

The textbook story on CoT positioning is simple: when speculators are at a 3-year extreme, they're wrong. Crowded long means a top is in; crowded short means a bottom is in. It's repeated in macro commentary for decades. We wanted to see whether it held up against twenty years of historical data. The answer turned out to be 'sometimes, and rarely as cleanly as the textbook claims' — the contrarian read describes what happened in some markets, the opposite happened in others, and several conditions almost no commentator mentions changed how the historical record reads dramatically. This page is the full audit: methodology, descriptive findings, and the conceptual frame we use when narrating CoT cards on the live dashboard.

Methodology

We pull the full CFTC archive for seven actively traded futures contracts from 2006 to 2026 (Bitcoin from 2018, the year CFTC began publishing it). For each weekly release we compute the speculator group's net position ranked 0–100 against its own trailing 3-year range — the standard COT Index. A signal entry is recorded when the index enters a tail (≤ 20 = extreme short, ≥ 80 = extreme long), but only on the first week of entry — sitting at the extreme for eight weeks counts as one entry, not eight, eliminating auto-correlation that inflates naive backtests. We then look up the underlying asset's price on the report date and at the market's pre-registered primary horizon (4w for Bitcoin and Crude, 26w for 10Y Treasury, 12w for the rest), plus secondary horizons. The descriptive read is P(underlying higher) and average underlying move — not 'signal hit rate.' Statistical context comes from a 1,000-sample bootstrap confidence interval that is computed for audit but de-emphasized on the user-facing cards. Everything is split by macro regime (risk-on vs risk-off) using the Alphameter's episode-aware classification. Critically, every conditional P(higher) is reported against the market's BASE RATE — P(higher) across all weeks regardless of positioning — so the reader sees lift, not raw level.

Step	What we control for
Entry-only filter	Eliminates auto-correlation from consecutive weeks at extreme
Per-market horizons	4w / 12w / 26w chosen against each asset's natural cycle
Base-rate framing	Every read measured vs the no-positioning-info prior
Regime conditioning	Tests whether the historical record changes in risk-on vs risk-off
Bootstrap CIs	Computed for audit, de-emphasized on the user-facing cards
7-market basket	Narrowed from 10 after persistent noise on JPY / Silver / Copper / NatGas

Each control matters: dropping any one fabricates patterns that don't replicate.

The Framing Lesson: Base Rate Is the Most Important Number

The cleanest framing lesson came from looking at the basket through a base-rate lens. Markets with strong secular drift have inflated P(higher) at every positioning bucket — not because positioning is informative, but because the underlying went up most of the time anyway. Nasdaq's 12-week base rate is 75%: buying QQQ on any random Tuesday and holding 12 weeks closed higher 75% of the time across two decades. So when speculators are at extreme shorts and we observe P(higher) 78%, that is not a +28pp signal vs a coin flip — it is a +3pp lift vs the base rate, basically noise. The same lens reframes Bitcoin (4w base rate 54%), Gold (12w 62%), 10Y Treasury (26w 56%), EUR (50%), VIX (45%). Every read on the dashboard is now reported as lift vs base rate. A bucket P(higher) within ±5pp of base rate is labelled 'similar to base rate — positioning isn't adding much.' This change alone moved several markets we had been narrating as 'signal' into honest 'weak' classification.

Market	Horizon	Base rate P(↑)	Why it matters
Nasdaq	12w	75%	Strong equity drift; raw P(↑) needs +10pp lift to be informative
10Y Treasury	26w	56%	Near flat — most of any bucket read is genuine positioning info
Bitcoin	4w	54%	Near flat at 4w; secular crypto run averages out at the shorter horizon
Gold	12w	62%	Moderate drift; bucket reads must clear it to count
Crude	4w	54%	Near flat at 4w
EUR/USD	12w	50%	True coin-flip base; any deviation is signal
VIX	12w	45%	Slight drift LOWER — long-term VIX is mean-reverting downward

The +Xpp annotation on every dashboard card is lift over this number, not raw hit rate.

Finding 1: The Textbook Read Holds in Some Markets, Not Others

Re-running the original textbook claim ('extremes are contrarian') under the new base-rate-framed methodology: 10Y Treasury buy-side at extreme shorts shows the largest lift in the basket — +29pp at the entry week (P(higher) 85% vs 56% base rate, across 34 historical entries). Nasdaq buy-side looks strong in raw terms (78%) but earns only +3pp of lift over its 75% drift — most of what reads as signal is just QQQ going up. Gold's fade-the-longs read remains the clearest anti-signal in the basket — historically, going against the crowd on gold has lost money, and the longer the extreme has persisted, the worse the historical record. Bitcoin, Crude, EUR, and VIX all show lift within ±10pp of base rate at the entry week — describing past behaviour, the entry-week effect on these markets has been mostly noise. This is less flattering than the original backtest implied; it is also more honest.

Market	Direction	Entry-week lift vs base rate	Sample	Descriptive read
10Y Treasury	Buy shorts (≤20)	+29pp	N=34	Strong entry-week lean — cleanest in the basket
Nasdaq	Buy shorts (≤20)	+3pp	N=49	Weak — raw 78% but mostly equity drift, not positioning info
Gold	Fade longs (≥80)	+5pp WRONG direction	N=30	Anti-signal — going against the crowd has historically lost
Bitcoin	Buy shorts (≤20)	+1pp	N=29	Near base rate at 4w — no clean entry-week lean
Crude / EUR / VIX	Either	within ±10pp	varies	Weak — entry-week effect mostly noise vs base rate

Finding 2: Open Interest Direction Sharpens the Historical Record

The single most overlooked variable in CoT analysis is what open interest is doing. When NDX cotIndex hits ≤ 20 with OI contracting over the prior 13 weeks — meaning specs are unwinding without new positions arriving — the historical 12-week record shows underlying higher 100% of the time across 26 past entries. When OI is expanding instead (specs piling into new shorts despite price), the same threshold shows essentially base-rate behaviour. The same logic sharpens the anti-signals: gold's fade-the-longs anti-signal gets WORSE with contracting OI (historical hit rate drops further against the contrarian read). Extreme positioning + confirming open-interest direction is the supercharged subset — what the dashboard surfaces as 'Confirmed Washout' on the buy side and 'Confirmed Crowding' on the fade side. None of these are predictions; they describe what the underlying did across the matched subset of past episodes.

Setup	All entries	OI contracting	OI expanding
NDX buy shorts (12w)	P(↑) 78% (N=49)	P(↑) 100% (N=26)	P(↑) near base rate
10Y buy shorts (26w)	P(↑) 85% (N=34)	P(↑) 80% (N=20)	P(↑) materially lower
Gold fade longs (12w)	P(↑) 67% — WRONG direction	WORSE — 90% (N=10)	Still anti

OI direction is a first-order filter for the supercharged subset — not a tiebreaker.

Finding 3: Smart-Money Counter-Positioning

CFTC reports both sides of every futures market. When speculators are crowded short, somebody has to be net long against them. That somebody is the commercial group — dealers in financial futures, producers and merchants in commodities. They are the genuine smart money, with information edges and balance sheets that specs do not have. When NDX specs are at extreme shorts AND dealers are simultaneously at their own 3-year extreme long, the descriptive past says: underlying higher 100% of the time across 15 historical occurrences over 12 weeks. That is the cleanest dual-actor agreement in the basket. Without commercial confirmation, NDX shorts still show some lift but materially weaker. 10Y Treasury shows the same pattern at smaller N. Gold does not benefit — its commercials are mostly hedging miners, not taking discretionary positions — confirming that gold positioning is structurally different from financials.

100% P(↑)

NDX buy + commercials long

N=15, 12W historical record

Materially weaker

NDX buy without commercial confirmation

N=34, the leftover

Near base rate

Gold buy + commercials long

Commercials don't help here

Finding 4: Alternative Trader Categories Reveal Hidden Patterns

Every CFTC report breaks positioning down into multiple trader categories, but the standard backtest only looks at one — Leveraged Funds in financials, Managed Money in commodities. We tested whether the slower-moving institutional categories carry different patterns: Asset Managers in financials, Other Reportables in commodities. The result was bigger than expected and reshaped how the dashboard reads several markets. Gold via Other Reportables runs a buy-shorts pattern at 74% 12-week historical hit rate — the same threshold via Managed Money is statistical noise. On the equity/crypto side, Asset Managers add anti-signals the Leveraged Funds read misses entirely — NDX Asset Managers at extreme longs have historically been a TREND-FOLLOWING category, not contrarian, so 'going against' them at extremes has lost money. Bitcoin Asset Managers at extreme longs has historically preceded sharp drops. The takeaway: the textbook 'speculator' category is not always the right one. On the live dashboard, each market's primary spec is accompanied by an alt-actor block when its alt-actor read tells a different story.

Market	Alt actor	Direction	Historical record	Sample
Gold	Other Reportables	Buy shorts	P(↑) 74% over 12w	N=34
Bitcoin	Asset Managers	Anti at extreme longs	Avg move -18% over 12w	N=20
NDX	Asset Managers	Anti at extreme longs	P(↑) 16% over 12w	N=37

The Market Profile Layer

Each market also carries a slow-moving 'character' profile, refreshed monthly from full history. Character is what stays constant across weeks: how persistent the entry-week lean is with depth, how long typical episodes last, whether the regime matters. The signal-shape field is the most important: a PERSISTENT market's historical lean has held even when positioning has sat at extreme for weeks; a DECAYS market's lean fades fast; a REVERSES market historically went the OTHER way as the extreme persisted. The dashboard's per-market narrative now knows this character. UST10Y is the basket's clearest example: signal shape DECAYS — the +29pp entry-week lift drops to roughly 0pp by week 4. A reader landing on UST10Y mid-episode is told the entry-week effect has historically faded. Episode-length character frames whether 'we are 4 weeks in' is early or late for that market.

The Highest-Conviction Tier: Dual Confirmation

The cleanest historical reading stacks both filters: speculators at extreme shorts, open interest contracting (washout), AND commercials at extreme longs (smart money on the other side). When all three conditions matched on NDX simultaneously in the past, the underlying was higher 100% of the time over 12 weeks across 15 historical occurrences. This combination fires roughly once a year on average — rare enough to never be a primary trading strategy on its own, but the most aligned cross-actor agreement the system surfaces. The dashboard surfaces this as the 'Dual Confirmation' alert tier above the regular CoT panel, and it triggers an explicit email to subscribers when it activates.

What Doesn't Work

Three negative findings worth surfacing. First, cross-market alignment — when many markets are simultaneously at extreme positioning — has a directionally correct but statistically weak relationship with forward changes in the macro regime composite. The correlation is real but the standard errors are too wide for it to be a standalone signal; it is context, not an alert. Second, week-over-week change in speculator net positioning produces some patterns on Bitcoin and natural gas that the level-based test misses, but it is mostly redundant with the OI and commercials filters above. Third, applying the same backtest infrastructure to news sentiment and Polymarket-implied probabilities produces materially weaker historical records than CoT, partly because the historical samples are smaller and partly because both sources carry survivorship and selection biases that positioning data does not. We label these honestly rather than pretending every input is equally rigorous.

Where to See It Live

The descriptive per-market reads, the alert widget, and the historical-lean badges all run on the live Alphamancy dashboard. New CFTC data hits the system every Saturday morning UTC (CFTC publishes Friday for Tuesday's snapshot), and the historical record re-scores any new entries automatically. The market profile layer refreshes monthly. The methodology is open: every descriptive claim on the system can be traced back to a specific subset of the 20-year sample with its own N and base-rate comparison.

Frequently Asked Questions

Does the textbook 'extremes are contrarian' rule hold up?▼

Only for some markets. The 20-year historical record shows the strongest entry-week lean on 10Y Treasury at extreme shorts (+29pp vs base rate). Nasdaq buy-side looks strong in raw terms but most of the apparent 'signal' is just equity drift — its lift vs base rate is only +3pp at the entry week. Gold fade-side has historically LOST money — going against the crowd here has lost, getting worse the longer the extreme has persisted. Bitcoin, EUR, Crude, and VIX have shown only weak entry-week leans against base rate. The rule holds in pieces, not universally.

Why do you keep saying 'historically' and 'descriptive past'?▼

Because the dashboard describes what happened — not what will happen. Every read on every card is the historical record of similar past setups, not a forecast. Past behaviour does not guarantee or reliably predict future behaviour. The framing is deliberate: it keeps the reader in the driver's seat for forming their own forward view from the descriptive past we surface.

What is 'lift vs base rate' and why do you cite it everywhere?▼

Base rate is the no-positioning-info prior — P(underlying higher) at the market's primary horizon across ALL weeks of history regardless of positioning. For Nasdaq it is 75% (equity drift); for EUR/USD it is 50% (true coin flip). Lift is the bucket-conditional P(higher) minus this base rate. Citing lift kills over-claiming: NDX 'higher 78% of the time at extreme shorts' sounds strong but is only +3pp above the 75% drift. The +Xpp / -Xpp annotations on every card are lift, not raw level.

What's the cleanest historical reading in the basket?▼

Nasdaq dual confirmation: specs at cotIndex ≤ 20, open interest contracting over the prior 13 weeks, AND commercials at their own 3-year extreme long. Across 15 historical occurrences the underlying was higher 100% of the time at the 12-week horizon. It fires roughly once per year. We surface this as the 'Dual Confirmation' alert tier.

What's a market profile and how does it change the narration?▼

Each market has a slow-moving character profile refreshed monthly from full history — signal shape (persistent / decays / reverses / weak), episode-length character, regime sensitivity. The big find: 10Y Treasury's signal shape is DECAYS. The +29pp lift at the entry week drops to roughly 0pp by week 4. The dashboard now narrates this — a reader landing mid-episode on a UST10Y extreme is told the entry-week effect has historically faded. The profile NEVER overrides this-week's specific signals; it is standing context for character, not a verdict.

Why don't you use COT Index level on its own?▼

Because for most markets, level alone produces weak lift over base rate. NDX buy at level alone shows only +3pp lift; adding the open-interest direction filter (the 'Confirmed Washout' subset) sharpens that materially. The level is the entry point; the conditioning variables — OI, commercials, alt actor — are what separate strong historical records from drift-level noise.

Why is gold different from equities?▼

Gold positioning is dominated by mining hedgers on the commercial side and trend-followers on the speculator side. There is no significant discretionary 'smart money' standing against the speculative crowd, so commercials at extremes do not carry information the way they do for index futures. The contrarian read also fails outright on gold — extreme spec longs have historically kept rallying, not reversed. Gold's historical record rewards reading trend continuation, not crowd-fading.

Track These Indicators Live

The Alphameter synthesizes six macro indicators into a single regime score — updated daily. See the current reading and full indicator breakdown on the dashboard.

Open Alphameter Read Methodology

Get notified when the market regime changes

Regime AlertsWeekly Newsletter

Regime alerts + weekly macro brief. Unsubscribe anytime.