Datasets

Format and Storage

Introduction

LEAN strives to use an open, human-readable format, so all data is stored in flat files (formatted as CSV or JSON). The data is compressed on disk using zip

Default Location

When you create an organization workspace in an empty directory, the CLI downloads the latest data directory from the LEAN repository. This directory contains a standard directory structure from which the LEAN engine reads. Once downloaded, the data directory tree looks like this:

data
├── alternative/
├── cfd/
├── crypto/
├── equity/
├── forex/
├── future/
├── futureoption/
├── index/
├── indexoption/
├── market-hours/
├── option/
├── symbol-properties/
└── readme.md

By default, the data directory contains a small amount of sample data for all asset types to demonstrate how data files must be formatted. Additionally, the data directory itself and most of its subdirectories contain readme.md files containing more documentation on the format of the data files of each asset type.

Change Location

You can configure the data directory to use in the data-folder property in your Lean configuration file. The path this property is set to is used as the data directory by all commands that run the LEAN engine locally. By default, this property points to the data directory inside your organization workspace. If this property is set to a relative path, it is resolved relative to the Lean configuration file's parent directory.

The data directory is the only local directory that is mounted into all Docker containers ran by the CLI, so it must contain all the local files you want to read from your algorithms. You can get the path to this directory in your algorithm using the Globals.DataFolder variable.

Other Data Sources

If you already have data of your own you can convert it to a LEAN-compatible format yourself. In that case, we recommend that you read the readme.md files generated by the lean init command in the data directory, as these files contain up-to-date documentation on the expected format of the data files.

For development purposes, it is also possible to generate data using the CLI. This generator uses a Brownian motion model to generate realistic market data, which might be helpful when you're testing strategies locally but don't have access to real market data.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: