The CoT Backtest: 20 Years of Historical Positioning Behavior
A rigorous 20-year audit of how seven markets behaved historically when CFTC Commitment of Traders positioning sat at similar conditions. Base-rate framed, per-market horizons, market character profiles — descriptive past, not future predictions.
Before You Read Any Further
This page is educational research. It describes what the underlying markets did historically when CFTC positioning sat at similar conditions — it does not predict what they will do next time. The dashboard surfaces the same descriptive past in plain language; we never use predictive verbs like 'expected higher' or 'will continue.' Past behaviour does not guarantee or reliably predict future behaviour. Every backtest carries residual survivorship, look-ahead, and selection risk even after controls. Nothing on this page is a recommendation to buy, sell, or hold any security. If you act on positioning data, do so at your own risk and consult a licensed financial advisor first.
Why We Ran This Backtest
The textbook story on CoT positioning is simple: when speculators are at a 3-year extreme, they're wrong. Crowded long means a top is in; crowded short means a bottom is in. It's repeated in macro commentary for decades. We wanted to see whether it held up against twenty years of historical data. The answer turned out to be 'sometimes, and rarely as cleanly as the textbook claims' — the contrarian read describes what happened in some markets, the opposite happened in others, and several conditions almost no commentator mentions changed how the historical record reads dramatically. This page is the full audit: methodology, descriptive findings, and the conceptual frame we use when narrating CoT cards on the live dashboard.

Free charts, alerts, and screeners for every asset discussed on this page. Used by 50M+ traders.
Methodology
We pull the full CFTC archive for seven actively traded futures contracts from 2006 to 2026 (Bitcoin from 2018, the year CFTC began publishing it). For each weekly release we compute the speculator group's net position ranked 0–100 against its own trailing 3-year range — the standard COT Index. A signal entry is recorded when the index enters a tail (≤ 20 = extreme short, ≥ 80 = extreme long), but only on the first week of entry — sitting at the extreme for eight weeks counts as one entry, not eight, eliminating auto-correlation that inflates naive backtests. We then look up the underlying asset's price on the report date and at the market's pre-registered primary horizon (4w for Bitcoin and Crude, 26w for 10Y Treasury, 12w for the rest), plus secondary horizons. The descriptive read is P(underlying higher) and average underlying move — not 'signal hit rate.' Statistical context comes from a 1,000-sample bootstrap confidence interval that is computed for audit but de-emphasized on the user-facing cards. Everything is split by macro regime (risk-on vs risk-off) using the Alphameter's episode-aware classification. Critically, every conditional P(higher) is reported against the market's BASE RATE — P(higher) across all weeks regardless of positioning — so the reader sees lift, not raw level.
| Step | What we control for |
|---|---|
| Entry-only filter | Eliminates auto-correlation from consecutive weeks at extreme |
| Per-market horizons | 4w / 12w / 26w chosen against each asset's natural cycle |
| Base-rate framing | Every read measured vs the no-positioning-info prior |
| Regime conditioning | Tests whether the historical record changes in risk-on vs risk-off |
| Bootstrap CIs | Computed for audit, de-emphasized on the user-facing cards |
| 7-market basket | Narrowed from 10 after persistent noise on JPY / Silver / Copper / NatGas |
The Framing Lesson: Base Rate Is the Most Important Number
The cleanest framing lesson came from looking at the basket through a base-rate lens. Markets with strong secular drift have inflated P(higher) at every positioning bucket — not because positioning is informative, but because the underlying went up most of the time anyway. Nasdaq's 12-week base rate is 75%: buying QQQ on any random Tuesday and holding 12 weeks closed higher 75% of the time across two decades. So when speculators are at extreme shorts and we observe P(higher) 78%, that is not a +28pp signal vs a coin flip — it is a +3pp lift vs the base rate, basically noise. The same lens reframes Bitcoin (4w base rate 54%), Gold (12w 62%), 10Y Treasury (26w 56%), EUR (50%), VIX (45%). Every read on the dashboard is now reported as lift vs base rate. A bucket P(higher) within ±5pp of base rate is labelled 'similar to base rate — positioning isn't adding much.' This change alone moved several markets we had been narrating as 'signal' into honest 'weak' classification.
| Market | Horizon | Base rate P(↑) | Why it matters |
|---|---|---|---|
| Nasdaq | 12w | 75% | Strong equity drift; raw P(↑) needs +10pp lift to be informative |
| 10Y Treasury | 26w | 56% | Near flat — most of any bucket read is genuine positioning info |
| Bitcoin | 4w | 54% | Near flat at 4w; secular crypto run averages out at the shorter horizon |
| Gold | 12w | 62% | Moderate drift; bucket reads must clear it to count |
| Crude | 4w | 54% | Near flat at 4w |
| EUR/USD | 12w | 50% | True coin-flip base; any deviation is signal |
| VIX | 12w | 45% | Slight drift LOWER — long-term VIX is mean-reverting downward |
Finding 1: The Textbook Read Holds in Some Markets, Not Others
Re-running the original textbook claim ('extremes are contrarian') under the new base-rate-framed methodology: 10Y Treasury buy-side at extreme shorts shows the largest lift in the basket — +29pp at the entry week (P(higher) 85% vs 56% base rate, across 34 historical entries). Nasdaq buy-side looks strong in raw terms (78%) but earns only +3pp of lift over its 75% drift — most of what reads as signal is just QQQ going up. Gold's fade-the-longs read remains the clearest anti-signal in the basket — historically, going against the crowd on gold has lost money, and the longer the extreme has persisted, the worse the historical record. Bitcoin, Crude, EUR, and VIX all show lift within ±10pp of base rate at the entry week — describing past behaviour, the entry-week effect on these markets has been mostly noise. This is less flattering than the original backtest implied; it is also more honest.
| Market | Direction | Entry-week lift vs base rate | Sample | Descriptive read |
|---|---|---|---|---|
| 10Y Treasury | Buy shorts (≤20) | +29pp | N=34 | Strong entry-week lean — cleanest in the basket |
| Nasdaq | Buy shorts (≤20) | +3pp | N=49 | Weak — raw 78% but mostly equity drift, not positioning info |
| Gold | Fade longs (≥80) | +5pp WRONG direction | N=30 | Anti-signal — going against the crowd has historically lost |
| Bitcoin | Buy shorts (≤20) | +1pp | N=29 | Near base rate at 4w — no clean entry-week lean |
| Crude / EUR / VIX | Either | within ±10pp | varies | Weak — entry-week effect mostly noise vs base rate |
Finding 2: Open Interest Direction Sharpens the Historical Record
The single most overlooked variable in CoT analysis is what open interest is doing. When NDX cotIndex hits ≤ 20 with OI contracting over the prior 13 weeks — meaning specs are unwinding without new positions arriving — the historical 12-week record shows underlying higher 100% of the time across 26 past entries. When OI is expanding instead (specs piling into new shorts despite price), the same threshold shows essentially base-rate behaviour. The same logic sharpens the anti-signals: gold's fade-the-longs anti-signal gets WORSE with contracting OI (historical hit rate drops further against the contrarian read). Extreme positioning + confirming open-interest direction is the supercharged subset — what the dashboard surfaces as 'Confirmed Washout' on the buy side and 'Confirmed Crowding' on the fade side. None of these are predictions; they describe what the underlying did across the matched subset of past episodes.
| Setup | All entries | OI contracting | OI expanding |
|---|---|---|---|
| NDX buy shorts (12w) | P(↑) 78% (N=49) | P(↑) 100% (N=26) | P(↑) near base rate |
| 10Y buy shorts (26w) | P(↑) 85% (N=34) | P(↑) 80% (N=20) | P(↑) materially lower |
| Gold fade longs (12w) | P(↑) 67% — WRONG direction | WORSE — 90% (N=10) | Still anti |
Finding 3: Smart-Money Counter-Positioning
CFTC reports both sides of every futures market. When speculators are crowded short, somebody has to be net long against them. That somebody is the commercial group — dealers in financial futures, producers and merchants in commodities. They are the genuine smart money, with information edges and balance sheets that specs do not have. When NDX specs are at extreme shorts AND dealers are simultaneously at their own 3-year extreme long, the descriptive past says: underlying higher 100% of the time across 15 historical occurrences over 12 weeks. That is the cleanest dual-actor agreement in the basket. Without commercial confirmation, NDX shorts still show some lift but materially weaker. 10Y Treasury shows the same pattern at smaller N. Gold does not benefit — its commercials are mostly hedging miners, not taking discretionary positions — confirming that gold positioning is structurally different from financials.
Finding 4: Alternative Trader Categories Reveal Hidden Patterns
Every CFTC report breaks positioning down into multiple trader categories, but the standard backtest only looks at one — Leveraged Funds in financials, Managed Money in commodities. We tested whether the slower-moving institutional categories carry different patterns: Asset Managers in financials, Other Reportables in commodities. The result was bigger than expected and reshaped how the dashboard reads several markets. Gold via Other Reportables runs a buy-shorts pattern at 74% 12-week historical hit rate — the same threshold via Managed Money is statistical noise. On the equity/crypto side, Asset Managers add anti-signals the Leveraged Funds read misses entirely — NDX Asset Managers at extreme longs have historically been a TREND-FOLLOWING category, not contrarian, so 'going against' them at extremes has lost money. Bitcoin Asset Managers at extreme longs has historically preceded sharp drops. The takeaway: the textbook 'speculator' category is not always the right one. On the live dashboard, each market's primary spec is accompanied by an alt-actor block when its alt-actor read tells a different story.
| Market | Alt actor | Direction | Historical record | Sample |
|---|---|---|---|---|
| Gold | Other Reportables | Buy shorts | P(↑) 74% over 12w | N=34 |
| Bitcoin | Asset Managers | Anti at extreme longs | Avg move -18% over 12w | N=20 |
| NDX | Asset Managers | Anti at extreme longs | P(↑) 16% over 12w | N=37 |
The Market Profile Layer
Each market also carries a slow-moving 'character' profile, refreshed monthly from full history. Character is what stays constant across weeks: how persistent the entry-week lean is with depth, how long typical episodes last, whether the regime matters. The signal-shape field is the most important: a PERSISTENT market's historical lean has held even when positioning has sat at extreme for weeks; a DECAYS market's lean fades fast; a REVERSES market historically went the OTHER way as the extreme persisted. The dashboard's per-market narrative now knows this character. UST10Y is the basket's clearest example: signal shape DECAYS — the +29pp entry-week lift drops to roughly 0pp by week 4. A reader landing on UST10Y mid-episode is told the entry-week effect has historically faded. Episode-length character frames whether 'we are 4 weeks in' is early or late for that market.
The Highest-Conviction Tier: Dual Confirmation
The cleanest historical reading stacks both filters: speculators at extreme shorts, open interest contracting (washout), AND commercials at extreme longs (smart money on the other side). When all three conditions matched on NDX simultaneously in the past, the underlying was higher 100% of the time over 12 weeks across 15 historical occurrences. This combination fires roughly once a year on average — rare enough to never be a primary trading strategy on its own, but the most aligned cross-actor agreement the system surfaces. The dashboard surfaces this as the 'Dual Confirmation' alert tier above the regular CoT panel, and it triggers an explicit email to subscribers when it activates.
What Doesn't Work
Three negative findings worth surfacing. First, cross-market alignment — when many markets are simultaneously at extreme positioning — has a directionally correct but statistically weak relationship with forward changes in the macro regime composite. The correlation is real but the standard errors are too wide for it to be a standalone signal; it is context, not an alert. Second, week-over-week change in speculator net positioning produces some patterns on Bitcoin and natural gas that the level-based test misses, but it is mostly redundant with the OI and commercials filters above. Third, applying the same backtest infrastructure to news sentiment and Polymarket-implied probabilities produces materially weaker historical records than CoT, partly because the historical samples are smaller and partly because both sources carry survivorship and selection biases that positioning data does not. We label these honestly rather than pretending every input is equally rigorous.
Where to See It Live
The descriptive per-market reads, the alert widget, and the historical-lean badges all run on the live Alphamancy dashboard. New CFTC data hits the system every Saturday morning UTC (CFTC publishes Friday for Tuesday's snapshot), and the historical record re-scores any new entries automatically. The market profile layer refreshes monthly. The methodology is open: every descriptive claim on the system can be traced back to a specific subset of the 20-year sample with its own N and base-rate comparison.

Free charts, alerts, and screeners for every asset discussed on this page. Used by 50M+ traders.
Frequently Asked Questions
Does the textbook 'extremes are contrarian' rule hold up?▼
Only for some markets. The 20-year historical record shows the strongest entry-week lean on 10Y Treasury at extreme shorts (+29pp vs base rate). Nasdaq buy-side looks strong in raw terms but most of the apparent 'signal' is just equity drift — its lift vs base rate is only +3pp at the entry week. Gold fade-side has historically LOST money — going against the crowd here has lost, getting worse the longer the extreme has persisted. Bitcoin, EUR, Crude, and VIX have shown only weak entry-week leans against base rate. The rule holds in pieces, not universally.
Why do you keep saying 'historically' and 'descriptive past'?▼
Because the dashboard describes what happened — not what will happen. Every read on every card is the historical record of similar past setups, not a forecast. Past behaviour does not guarantee or reliably predict future behaviour. The framing is deliberate: it keeps the reader in the driver's seat for forming their own forward view from the descriptive past we surface.
What is 'lift vs base rate' and why do you cite it everywhere?▼
Base rate is the no-positioning-info prior — P(underlying higher) at the market's primary horizon across ALL weeks of history regardless of positioning. For Nasdaq it is 75% (equity drift); for EUR/USD it is 50% (true coin flip). Lift is the bucket-conditional P(higher) minus this base rate. Citing lift kills over-claiming: NDX 'higher 78% of the time at extreme shorts' sounds strong but is only +3pp above the 75% drift. The +Xpp / -Xpp annotations on every card are lift, not raw level.
What's the cleanest historical reading in the basket?▼
Nasdaq dual confirmation: specs at cotIndex ≤ 20, open interest contracting over the prior 13 weeks, AND commercials at their own 3-year extreme long. Across 15 historical occurrences the underlying was higher 100% of the time at the 12-week horizon. It fires roughly once per year. We surface this as the 'Dual Confirmation' alert tier.
What's a market profile and how does it change the narration?▼
Each market has a slow-moving character profile refreshed monthly from full history — signal shape (persistent / decays / reverses / weak), episode-length character, regime sensitivity. The big find: 10Y Treasury's signal shape is DECAYS. The +29pp lift at the entry week drops to roughly 0pp by week 4. The dashboard now narrates this — a reader landing mid-episode on a UST10Y extreme is told the entry-week effect has historically faded. The profile NEVER overrides this-week's specific signals; it is standing context for character, not a verdict.
Why don't you use COT Index level on its own?▼
Because for most markets, level alone produces weak lift over base rate. NDX buy at level alone shows only +3pp lift; adding the open-interest direction filter (the 'Confirmed Washout' subset) sharpens that materially. The level is the entry point; the conditioning variables — OI, commercials, alt actor — are what separate strong historical records from drift-level noise.
Why is gold different from equities?▼
Gold positioning is dominated by mining hedgers on the commercial side and trend-followers on the speculator side. There is no significant discretionary 'smart money' standing against the speculative crowd, so commercials at extremes do not carry information the way they do for index futures. The contrarian read also fails outright on gold — extreme spec longs have historically kept rallying, not reversed. Gold's historical record rewards reading trend continuation, not crowd-fading.
Related Topics
Track These Indicators Live
The Alphameter synthesizes six macro indicators into a single regime score — updated daily. See the current reading and full indicator breakdown on the dashboard.
Get notified when the market regime changes
Regime alerts + weekly macro brief. Unsubscribe anytime.