Skip to content

Dataset Management API

Low-level dataset storage and retrieval for structured data (portfolios, watchlists, security metadata).

Overview

The chronos_lab.dataset module manages structured datasets with dual storage backend support (local JSON files and DynamoDB).

Low-Level API

Most users should use the high-level functions instead:

  • Use from_dataset() in chronos_lab.sources for reading datasets
  • Use to_dataset() in chronos_lab.storage for writing datasets

Only use the Dataset class directly when building custom dataset management workflows.

Dataset Naming Convention:

  • Local datasets: Use any name (stored as {name}.json)
  • DynamoDB datasets: Prefix with ddb_ (e.g., ddb_securities)

Storage Backends:

  • Local: JSON files in ~/.chronos_lab/datasets (configurable via DATASET_LOCAL_PATH)
  • DynamoDB: AWS DynamoDB table (requires DATASET_DDB_TABLE_NAME configuration)

Classes

chronos_lab.dataset.Dataset

Dataset(*, ddb_table_name=None, local_path=None)

Manager for structured datasets stored locally or in DynamoDB.

Handles reading and writing datasets with support for both local JSON files and AWS DynamoDB tables. Automatically manages dataset locations based on naming conventions and configuration.

Attributes:

Name Type Description
_table_name

DynamoDB table name (if configured)

_local_path

Local filesystem path for JSON datasets

_dataset_map

Mapping of dataset names to DynamoDB keys (pk/sk)

_database

DynamoDBDatabase instance (if DynamoDB configured)

Examples:

Work with local datasets: >>> ds = Dataset() >>> # Get as dictionary >>> data_dict = ds.get_dataset(dataset_name='example') >>> # Get as DataFrame >>> df = ds.get_datasetDF(dataset_name='example')

Work with DynamoDB datasets: >>> ds = Dataset(ddb_table_name='my-datasets') >>> data = ds.get_dataset(dataset_name='ddb_securities') >>> df = ds.get_datasetDF(dataset_name='ddb_securities')

Note
  • Local datasets: Names without 'ddb_' prefix
  • DynamoDB datasets: Names with 'ddb_' prefix
  • DynamoDB requires DATASET_DDB_TABLE_NAME and DATASET_DDB_MAP in settings

Initialize Dataset manager with local and/or DynamoDB configuration.

Parameters:

Name Type Description Default
ddb_table_name

DynamoDB table name. If None, uses DATASET_DDB_TABLE_NAME from configuration.

None
local_path

Local filesystem path for JSON datasets. If None, uses DATASET_LOCAL_PATH from configuration.

None

get_dataset

get_dataset(*, dataset_name)

Retrieve a dataset as a dictionary.

Fetches dataset from local JSON file or DynamoDB table based on naming convention.

Parameters:

Name Type Description Default
dataset_name

Dataset identifier. Use 'ddb_' prefix for DynamoDB datasets, no prefix for local JSON files.

required

Returns:

Type Description

Dictionary with keys: - 'statusCode': 0 on success, -1 on failure - 'payload': Dictionary of dataset items (keys to attribute dicts)

Examples:

Get local dataset: >>> ds = Dataset() >>> result = ds.get_dataset(dataset_name='example') >>> if result['statusCode'] == 0: ... data = result['payload'] ... print(data.keys())

Get DynamoDB dataset: >>> ds = Dataset() >>> result = ds.get_dataset(dataset_name='ddb_securities') >>> data = result['payload']

Note
  • Local datasets loaded from {DATASET_LOCAL_PATH}/{name}.json
  • DynamoDB datasets require configuration in DATASET_DDB_MAP
  • DynamoDB items are keyed by their 'sk' (sort key) value

get_datasetDF

get_datasetDF(**kwargs)

Retrieve a dataset as a pandas DataFrame with automatic type inference.

Fetches dataset and converts to DataFrame with automatic detection and conversion of datetime and numeric columns.

Parameters:

Name Type Description Default
**kwargs

Arguments passed to get_dataset(), including dataset_name

{}

Returns:

Type Description

pandas DataFrame with inferred types, or None on error

Examples:

Get local dataset as DataFrame: >>> ds = Dataset() >>> df = ds.get_datasetDF(dataset_name='example') >>> print(df.head()) >>> print(df.dtypes)

Get DynamoDB dataset as DataFrame: >>> ds = Dataset() >>> df = ds.get_datasetDF(dataset_name='ddb_securities') >>> # DataFrame index is the sort key (sk) from DynamoDB

Note
  • Automatically converts ISO datetime strings to pandas datetime
  • Automatically converts numeric strings to numeric types
  • Index is the dataset keys (filename for local, 'sk' for DynamoDB)

save_dataset

save_dataset(dataset_name, dataset)

Save a dataset to local JSON file or DynamoDB table.

Stores dataset dictionary based on naming convention. Creates parent directories if needed for local storage.

Parameters:

Name Type Description Default
dataset_name

Dataset identifier. Use 'ddb_' prefix for DynamoDB, no prefix for local JSON.

required
dataset

Dictionary of items to save (keys to attribute dicts)

required

Returns:

Type Description

Dictionary with 'statusCode': 0 on success, -1 on failure

Examples:

Save to local JSON: >>> ds = Dataset() >>> data = { ... 'item1': {'name': 'Product A', 'price': 9.99}, ... 'item2': {'name': 'Product B', 'price': 19.99} ... } >>> result = ds.save_dataset('products', data)

Save to DynamoDB: >>> ds = Dataset() >>> data = { ... 'AAPL': {'name': 'Apple Inc.', 'sector': 'Technology'}, ... 'MSFT': {'name': 'Microsoft', 'sector': 'Technology'} ... } >>> result = ds.save_dataset('ddb_securities', data)

Note
  • Local datasets saved to {DATASET_LOCAL_PATH}/{name}.json
  • DynamoDB datasets require configuration in DATASET_DDB_MAP
  • DynamoDB items get 'pk' and 'sk' added automatically from map
  • JSON dates are serialized as strings using default=str

delete_dataset_items

delete_dataset_items(dataset_name, items)

Delete specific items from a DynamoDB dataset.

Removes items from DynamoDB table using batch delete. Not supported for local JSON datasets.

Parameters:

Name Type Description Default
dataset_name

DynamoDB dataset name (must start with 'ddb_')

required
items

List of sort key (sk) values identifying items to delete

required

Returns:

Type Description

Dictionary with 'statusCode': 0 on success, -1 on failure

Examples:

Delete items from DynamoDB dataset: >>> ds = Dataset() >>> items_to_delete = ['AAPL', 'MSFT', 'GOOGL'] >>> result = ds.delete_dataset_items( ... dataset_name='ddb_securities', ... items=items_to_delete ... ) >>> if result['statusCode'] == 0: ... print("Items deleted successfully")

Note
  • Only works with DynamoDB datasets (names starting with 'ddb_')
  • Not supported for local JSON datasets
  • Requires DATASET_DDB_TABLE_NAME configuration
  • Items identified by their sort key (sk) values
  • Uses batch delete for efficiency