Dataset Management API¶
Low-level dataset storage and retrieval for structured data (portfolios, watchlists, security metadata).
Overview¶
The chronos_lab.dataset module manages structured datasets with dual storage backend support (local JSON files and DynamoDB).
Low-Level API
Most users should use the high-level functions instead:
- Use
from_dataset()inchronos_lab.sourcesfor reading datasets - Use
to_dataset()inchronos_lab.storagefor writing datasets
Only use the Dataset class directly when building custom dataset management workflows.
Dataset Naming Convention:
- Local datasets: Use any name (stored as
{name}.json) - DynamoDB datasets: Prefix with
ddb_(e.g.,ddb_securities)
Storage Backends:
- Local: JSON files in
~/.chronos_lab/datasets(configurable viaDATASET_LOCAL_PATH) - DynamoDB: AWS DynamoDB table (requires
DATASET_DDB_TABLE_NAMEconfiguration)
Classes¶
chronos_lab.dataset.Dataset ¶
Manager for structured datasets stored locally or in DynamoDB.
Handles reading and writing datasets with support for both local JSON files and AWS DynamoDB tables. Automatically manages dataset locations based on naming conventions and configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
_table_name |
DynamoDB table name (if configured) |
|
_local_path |
Local filesystem path for JSON datasets |
|
_dataset_map |
Mapping of dataset names to DynamoDB keys (pk/sk) |
|
_database |
DynamoDBDatabase instance (if DynamoDB configured) |
Examples:
Work with local datasets: >>> ds = Dataset() >>> # Get as dictionary >>> data_dict = ds.get_dataset(dataset_name='example') >>> # Get as DataFrame >>> df = ds.get_datasetDF(dataset_name='example')
Work with DynamoDB datasets: >>> ds = Dataset(ddb_table_name='my-datasets') >>> data = ds.get_dataset(dataset_name='ddb_securities') >>> df = ds.get_datasetDF(dataset_name='ddb_securities')
Note
- Local datasets: Names without 'ddb_' prefix
- DynamoDB datasets: Names with 'ddb_' prefix
- DynamoDB requires DATASET_DDB_TABLE_NAME and DATASET_DDB_MAP in settings
Initialize Dataset manager with local and/or DynamoDB configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ddb_table_name
|
DynamoDB table name. If None, uses DATASET_DDB_TABLE_NAME from configuration. |
None
|
|
local_path
|
Local filesystem path for JSON datasets. If None, uses DATASET_LOCAL_PATH from configuration. |
None
|
get_dataset ¶
Retrieve a dataset as a dictionary.
Fetches dataset from local JSON file or DynamoDB table based on naming convention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_name
|
Dataset identifier. Use 'ddb_' prefix for DynamoDB datasets, no prefix for local JSON files. |
required |
Returns:
| Type | Description |
|---|---|
|
Dictionary with keys: - 'statusCode': 0 on success, -1 on failure - 'payload': Dictionary of dataset items (keys to attribute dicts) |
Examples:
Get local dataset: >>> ds = Dataset() >>> result = ds.get_dataset(dataset_name='example') >>> if result['statusCode'] == 0: ... data = result['payload'] ... print(data.keys())
Get DynamoDB dataset: >>> ds = Dataset() >>> result = ds.get_dataset(dataset_name='ddb_securities') >>> data = result['payload']
Note
- Local datasets loaded from {DATASET_LOCAL_PATH}/{name}.json
- DynamoDB datasets require configuration in DATASET_DDB_MAP
- DynamoDB items are keyed by their 'sk' (sort key) value
get_datasetDF ¶
Retrieve a dataset as a pandas DataFrame with automatic type inference.
Fetches dataset and converts to DataFrame with automatic detection and conversion of datetime and numeric columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Arguments passed to get_dataset(), including dataset_name |
{}
|
Returns:
| Type | Description |
|---|---|
|
pandas DataFrame with inferred types, or None on error |
Examples:
Get local dataset as DataFrame: >>> ds = Dataset() >>> df = ds.get_datasetDF(dataset_name='example') >>> print(df.head()) >>> print(df.dtypes)
Get DynamoDB dataset as DataFrame: >>> ds = Dataset() >>> df = ds.get_datasetDF(dataset_name='ddb_securities') >>> # DataFrame index is the sort key (sk) from DynamoDB
Note
- Automatically converts ISO datetime strings to pandas datetime
- Automatically converts numeric strings to numeric types
- Index is the dataset keys (filename for local, 'sk' for DynamoDB)
save_dataset ¶
Save a dataset to local JSON file or DynamoDB table.
Stores dataset dictionary based on naming convention. Creates parent directories if needed for local storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_name
|
Dataset identifier. Use 'ddb_' prefix for DynamoDB, no prefix for local JSON. |
required | |
dataset
|
Dictionary of items to save (keys to attribute dicts) |
required |
Returns:
| Type | Description |
|---|---|
|
Dictionary with 'statusCode': 0 on success, -1 on failure |
Examples:
Save to local JSON: >>> ds = Dataset() >>> data = { ... 'item1': {'name': 'Product A', 'price': 9.99}, ... 'item2': {'name': 'Product B', 'price': 19.99} ... } >>> result = ds.save_dataset('products', data)
Save to DynamoDB: >>> ds = Dataset() >>> data = { ... 'AAPL': {'name': 'Apple Inc.', 'sector': 'Technology'}, ... 'MSFT': {'name': 'Microsoft', 'sector': 'Technology'} ... } >>> result = ds.save_dataset('ddb_securities', data)
Note
- Local datasets saved to {DATASET_LOCAL_PATH}/{name}.json
- DynamoDB datasets require configuration in DATASET_DDB_MAP
- DynamoDB items get 'pk' and 'sk' added automatically from map
- JSON dates are serialized as strings using default=str
delete_dataset_items ¶
Delete specific items from a DynamoDB dataset.
Removes items from DynamoDB table using batch delete. Not supported for local JSON datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_name
|
DynamoDB dataset name (must start with 'ddb_') |
required | |
items
|
List of sort key (sk) values identifying items to delete |
required |
Returns:
| Type | Description |
|---|---|
|
Dictionary with 'statusCode': 0 on success, -1 on failure |
Examples:
Delete items from DynamoDB dataset: >>> ds = Dataset() >>> items_to_delete = ['AAPL', 'MSFT', 'GOOGL'] >>> result = ds.delete_dataset_items( ... dataset_name='ddb_securities', ... items=items_to_delete ... ) >>> if result['statusCode'] == 0: ... print("Items deleted successfully")
Note
- Only works with DynamoDB datasets (names starting with 'ddb_')
- Not supported for local JSON datasets
- Requires DATASET_DDB_TABLE_NAME configuration
- Items identified by their sort key (sk) values
- Uses batch delete for efficiency