Analysis Driver API¶
Hamilton Driver wrapper for composable analysis calculations.
Overview¶
The chronos_lab.analysis.driver module provides the AnalysisDriver class, a wrapper around Apache Hamilton's Driver that simplifies running analysis calculations with shared configuration, caching, and execution management.
Key Features:
- Zero-config defaults -
AnalysisDriver()works out of the box - Flexible execution - Multithreading or multiprocessing for symbol-level parallelization
- Persistent caching - Hamilton's cache for expensive computations
API Reference¶
chronos_lab.analysis.driver.AnalysisDriver ¶
AnalysisDriver(*, enable_cache: bool = False, cache_path: str = None, local_executor_type: Optional[str] = 'synchronous', remote_executor_type: str = 'multithreading', max_parallel_tasks: int = 5, enable_telemetry: bool = False)
Hamilton Driver wrapper for composable analysis calculations.
Manages Hamilton Driver instances for different calculation types with shared caching and execution configuration. Each calculation type gets its own Driver (built once, reused on subsequent calls). All calculations share the same cache directory for maximum efficiency.
Initialize AnalysisDriver with shared configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enable_cache
|
bool
|
Enable Hamilton caching for expensive computations. Defaults to False. |
False
|
cache_path
|
str
|
Directory path for cache storage. If None, uses HAMILTON_CACHE_PATH from settings. If the setting is not set, raises ValueError. |
None
|
local_executor_type
|
Optional[str]
|
Local executor type. |
'synchronous'
|
remote_executor_type
|
str
|
Remote executor type for parallel processing. Options: 'multithreading' or 'multiprocessing'. Defaults to 'multithreading'. |
'multithreading'
|
max_parallel_tasks
|
int
|
Maximum number of parallel tasks for symbol-level processing. Defaults to 5. |
5
|
enable_telemetry
|
bool
|
Enable Hamilton telemetry data collection. Defaults to False. |
False
|
detect_anomalies ¶
detect_anomalies(ohlcv: Optional[DataFrame] = None, ohlcv_from_source: str = 'disabled', ohlcv_from_config: Dict[str, Any] = None, ohlcv_features_list: List[str] = None, use_adjusted: bool = True, isolation_forest_config: Dict[str, Any] = None, to_dataset: str = 'disabled', to_dataset_config: Dict[str, Any] = None, to_arcticdb: str = 'disabled', to_arcticdb_config: Dict[str, Any] = None, driver_config: Dict[str, Any] = None) -> Dict[str, Any]
Detect anomalies in OHLCV time series data using Isolation Forest.
Executes a Hamilton DAG that standardizes OHLCV data, computes features, applies Isolation Forest anomaly detection, and optionally persists results to datasets or ArcticDB. Supports multiple data sources (Yahoo Finance, Intrinio, ArcticDB) or direct DataFrame input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ohlcv
|
Optional[DataFrame]
|
Pre-loaded OHLCV DataFrame with MultiIndex (date, symbol). Required if ohlcv_from_source is 'disabled'. Defaults to None. |
None
|
ohlcv_from_source
|
str
|
Data source for OHLCV retrieval. Options: 'disabled' (use ohlcv parameter), 'yfinance', 'intrinio', or 'arcticdb'. Defaults to 'disabled'. |
'disabled'
|
ohlcv_from_config
|
Dict[str, Any]
|
Configuration dictionary passed to the data source function. Required when ohlcv_from_source is not 'disabled'. Defaults to None. |
None
|
ohlcv_features_list
|
List[str]
|
List of feature names to compute from OHLCV data. Options: 'returns', 'volume_change', 'high_low_range', 'volatility'. Defaults to ['returns', 'volume_change', 'high_low_range']. |
None
|
use_adjusted
|
bool
|
Whether to use adjusted OHLCV columns (adj_close, etc.) if available. Defaults to True. |
True
|
isolation_forest_config
|
Dict[str, Any]
|
Configuration dictionary for sklearn's IsolationForest. Defaults to {'contamination': 0.02, 'random_state': 42, 'n_estimators': 200, 'max_samples': 250}. |
None
|
to_dataset
|
str
|
Whether to save anomaly results to a dataset. Options: 'disabled' or 'enabled'. Defaults to 'disabled'. |
'disabled'
|
to_dataset_config
|
Dict[str, Any]
|
Configuration for dataset output. Defaults to {'dataset_name': 'ohlcv_anomalies', 'ddb_dataset_ttl': 7}. |
None
|
to_arcticdb
|
str
|
Whether to save results to ArcticDB. Options: 'disabled' or 'enabled'. Defaults to 'disabled'. |
'disabled'
|
to_arcticdb_config
|
Dict[str, Any]
|
Configuration for ArcticDB output. Defaults to {'backend': 'LMDB', 'library_name': 'analysis', 'symbol_prefix': '', 'symbol_suffix': '_ohlcv_anomaly'}. |
None
|
driver_config
|
Dict[str, Any]
|
Additional configuration passed to the Hamilton Driver builder. Defaults to {}. |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dictionary containing execution results with keys 'analysis_result' |
Dict[str, Any]
|
(DataFrame with anomaly scores and flags), 'analysis_to_dataset' |
Dict[str, Any]
|
(dataset save status), and 'analysis_to_arcticdb' (ArcticDB save status). |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither ohlcv nor ohlcv_from_source is provided, or if ohlcv_from_source is unsupported, or if ohlcv_from_config is missing when required. |