Factor Research Workflow¶

The end-to-end process for researching a new alpha factor in Quant101.

The Research Loop¶

1. Hypothesis    → Why should this alpha exist?
2. Construct     → Compute the factor signal
3. Evaluate      → IC, IR, decay, turnover
4. Preprocess    → Winsorize, normalize, neutralize
5. Backtest      → Portfolio returns with sizing & costs
6. Validate      → Walk-forward, bootstrap, multiple testing
7. Reflect       → What did we learn? Next iteration?

Research Discipline

Never accept "it works" as a conclusion. Always ask:

What economic mechanism supports this alpha?
Under what regime will it fail?
Is it statistically significant after multiple-testing correction?

Step 1: Hypothesis¶

Before writing any code, articulate:

The signal: What information does the factor capture?
The mechanism: Why should this predict future returns?
The expected sign: Should high values predict high or low returns?
The decay profile: How quickly should the signal lose power?

Example for BBIBOLL:

Stocks with BBI deviation far below the lower Bollinger Band are temporarily oversold and likely to mean-revert. Expected IC sign: negative (low deviation → high future return). Should decay within 5–10 days.

Step 2: Construct the Factor¶

All factors follow the (date, ticker, value) convention:

import polars as pl
from portfolio.factors import register_factor

@register_factor("my_factor")
def compute_my_factor(ohlcv: pl.LazyFrame, **params) -> pl.DataFrame:
    """Compute my custom factor signal."""
    return (
        ohlcv.group_by("ticker")
        .agg(...)  # Your signal logic
        .select(["date", "ticker", pl.col("signal").alias("value")])
        .collect()
    )

Once registered, the factor is available in AlphaConfig:

config = AlphaConfig(
    factor_configs={"my_factor": FactorConfig(direction=1)},
    ...
)

Step 3: Evaluate with IC/IR¶

from alpha.factor_analyzer import FactorAnalyzer
from alpha.forward_returns import compute_forward_returns

fwd = compute_forward_returns(returns_df, horizons=[1, 5, 10, 20])
fa = FactorAnalyzer(factor_df, fwd)

print(f"IC:  {fa.ic_series(horizon=1).mean():.4f}")
print(f"IR:  {fa.information_ratio(horizon=1):.4f}")

Interpretation guidelines:

Metric	Weak	Decent	Strong	Suspicious
\|IC\|	< 0.02	0.02–0.05	0.05–0.10	> 0.15
\|IR\|	< 0.05	0.05–0.15	0.15–0.30	> 0.50

Weak signals are normal

Single-factor IRs rarely exceed 0.3. The power comes from combining many weak-but-orthogonal signals (Fundamental Law of Active Management):

\[\text{IR}_{\text{portfolio}} \approx \text{IR}_{\text{single}} \times \sqrt{N}\]

Step 4: Preprocess¶

from alpha.preprocessing import preprocess_factor

clean = preprocess_factor(
    factor_df,
    winsorize_pct=0.025,
    normalize_method="zscore",
    neutralize=None,  # or "sector"
)

Why each step matters:

Winsorize: Prevents outliers from dominating cross-sectional rank
Z-score: Makes signals comparable across factors for combination
Sector neutralize: Removes sector beta — pure stock selection alpha

Step 5: Backtest¶

from portfolio.pipeline import run_alpha_pipeline
from portfolio.alpha_config import AlphaConfig, FactorConfig

config = AlphaConfig(
    factor_configs={"my_factor": FactorConfig(direction=1)},
    sizing_method="Equal-Weight",
    rebal_every_n=5,
    n_long=10,
    n_short=10,
)
results = run_alpha_pipeline(ohlcv, config=config)
print(f"Sharpe: {results['sharpe']:.3f}")

Compare multiple configs:

from backtest.weight_backtester import WeightBacktester

configs = {
    "EW_daily": AlphaConfig(..., sizing_method="Equal-Weight", rebal_every_n=1),
    "EW_weekly": AlphaConfig(..., sizing_method="Equal-Weight", rebal_every_n=5),
    "SW_daily": AlphaConfig(..., sizing_method="Signal-Weighted", rebal_every_n=1),
}
for name, cfg in configs.items():
    r = run_alpha_pipeline(ohlcv, config=cfg)
    print(f"{name}: Sharpe={r['sharpe']:.3f}")

Step 6: Validate¶

from portfolio.walk_forward_runner import run_walk_forward

wf_results = run_walk_forward(ohlcv, config=config)
print(f"Mean OOS Sharpe: {wf_results['mean_oos_sharpe']:.3f}")
print(f"Sharpe decay:    {wf_results['sharpe_decay']:.3f}")

Then run the statistical gauntlet:

from validation.statistical_tests import (
    bootstrap_sharpe_ci,
    deflated_sharpe_ratio,
)

ci = bootstrap_sharpe_ci(portfolio_returns)
dsr = deflated_sharpe_ratio(portfolio_returns, n_trials=16)

Multiple Testing

If you test 16 configs, use apply_all_corrections() on the p-values. In our experience, 0 out of 16 configs survived Benjamini-Hochberg correction for the BBIBOLL factor alone. This is expected for weak signals — the goal is multi-factor combination.

Step 7: Reflect & Document¶

Write up findings in the LaTeX journal (docs/latex/quant_lab.tex):

What was the hypothesis?
What did IC/IR look like?
Did it survive walk-forward?
What regime does it work in?
What's the next iteration?

Factor research is an iterative process. Most factors will fail. The discipline is in the process, not the outcome.