Skip to main content

Concepts

Dagster provides a variety of abstractions for building and orchestrating data pipelines. These concepts enable a modular, declarative approach to data engineering, making it easier to manage dependencies, monitor execution, and ensure data quality.

Asset

An asset represents a logical unit of data such as a table, dataset, or machine learning model. Assets can have dependencies on other assets, forming the data lineage for your pipelines. As the core abstraction in Dagster, assets can interact with many other Dagster entities to facilitate certain tasks. When you define an asset, either with the @dg.asset decorator or via a component, the definition is automatically added to a top-level Definitions object.

ConceptRelationship
asset checkasset may use an asset check
asset specasset is described by an asset spec
componentasset may be programmatically built by a component
configasset may use a config
definitionsasset is added to a top-level Definitions object to be deployed
io managerasset may use a io manager
partitionasset may use a partition
resourceasset may use a resource
jobasset may be used in a job
scheduleasset may be used in a schedule
sensorasset may be used in a sensor

Asset check

An asset_check is associated with an asset to ensure it meets certain expectations around data quality, freshness or completeness. Asset checks run when the asset is executed and store metadata about the related run and if all the conditions of the check were met.

ConceptRelationship
assetasset check may be used by an asset
definitionsasset check is added to a top-level Definitions object to be deployed

Asset spec

Specs are standalone objects that describe the identity and metadata of Dagster entities without defining their behavior. For example, an AssetSpec contains essential information like the asset's key (its unique identifier) and tags (labels for organizing and annotating the asset), but it doesn't include the logic for materializing that asset.

ConceptRelationship
assetasset spec may describe the identity and metadata of an asset

Code location

A code location is a collection of Dagster entity definitions deployed in a specific environment. A code location determines the Python environment (including the version of Dagster being used as well as any other Python dependencies). A Dagster project can have multiple code locations, helping isolate dependencies.

ConceptRelationship
definitionscode location must contain at least one top-level Definitions object

Component

Components are objects that programmatically build assets and other Dagster entity definitions, such as asset_checks, schedules, resources, and sensors. They accept schematized configuration parameters (which are specified using YAML or lightweight Python) and use them to build the actual definitions you need. Components are designed to help you quickly bootstrap parts of your Dagster project and serve as templates for repeatable patterns.

ConceptRelationship
assetcomponent builds assets and other definitions
asset checkcomponent builds asset_checks and other definitions
definitionscomponent builds assets and other definitions
jobcomponent builds jobs and other definitions
schedulecomponent builds schedules and other definitions
sensorcomponent builds sensors and other definitions
resourcecomponent builds resources and other definitions

Config

A config is used to specify config schema for assets, jobs, schedules, and sensors. A RunConfig is a container for all the configuration that can be passed to a run. This allows for parameterization and the reuse of pipelines to serve multiple purposes.

ConceptRelationship
assetconfig may be used by an asset
jobconfig may be used by a job
scheduleconfig may be used by a schedule
sensorconfig may be used by a sensor

Definitions

In Dagster, "definitions" means two things:

  • The objects that combine metadata about Dagster entities with Python functions that define how they behave, for example, asset, ScheduleDefinition , and resource definitions.
  • The top-level Definitions object that contains references to all the definitions in a Dagster project. Entities included in the Definitions object will be deployed and visible within the Dagster UI.
ConceptRelationship
assetTop-level Definitions object may contain one or more asset definitions
asset checkTop-level Definitions object may contain one or more asset check definitions
io managerTop-level Definitions object may contain one or more io manager definitions
jobTop-level Definitions object may contain one or more job definitions
resourceTop-level Definitions object may contain one or more resource definitions
scheduleTop-level Definitions object may contain one or more schedule definitions
sensorTop-level Definitions object may contain one or more sensor definitions
componentdefinition may be the output of a component
code locationdefinitions must be deployed in a code location

Graph

A GraphDefinition connects multiple ops together to form a DAG. If you are using assets, you will not need to use graphs directly.

ConceptRelationship
configgraph may use a config
opgraph must include one or more ops
jobgraph must be part of job to execute

IO manager

An IOManager defines how data is stored and retrieved between the execution of assets and ops. This allows for a customizable storage and format at any interaction in a pipeline.

ConceptRelationship
assetio manager may be used by an asset
definitionsio manager is added to a top-level Definitions object to be deployed

Job

A job is a subset of assets or the GraphDefinition of ops. Jobs are the main form of execution in Dagster.

ConceptRelationship
assetjob may contain a selection of assets
configjob may use a config
graphjob may contain a graph
schedulejob may be used by a schedule
sensorjob may be used by a sensor
definitionsjob is added to a top-level Definitions object to be deployed

Op

An op is a computational unit of work. Ops are arranged into a GraphDefinition to dictate their order. Ops have largely been replaced by assets.

ConceptRelationship
typeop may use a type
graphop must be contained in graph to execute

Partition

A PartitionsDefinition represents a logical slice of a dataset or computation mapped to a certain segments (such as increments of time). Partitions enable incremental processing, making workflows more efficient by only running on relevant subsets of data.

ConceptRelationship
assetpartition may be used by an asset

Resource

A ResourceDefinition is a way to make external resources (like database or API connections) available to Dagster entities (like assets, schedules, or sensors) during job execution, and to clean up after execution resolves. A ConfigurableResource is a resource that uses structured configuration. For more information, see Configuring resources.

ConceptRelationship
assetresource may be used by an asset
scheduleresource may be used by a schedule
sensorresource may be used by a sensor
definitionsresource is added to a top-level Definitions object to be deployed

Type

A type is a way to define and validate the data passed between ops.

ConceptRelationship
optype may be used by an op

Schedule

A ScheduleDefinition is a way to automate jobs or assets to occur on a specified interval. In the cases that a job or asset is parameterized, the schedule can also be set with a run configuration (RunConfig) to match.

ConceptRelationship
assetschedule may include a job or selection of assets
configschedule may include a config if the job or assets include a config
jobschedule may include a job or selection of assets
definitionsschedule is added to a top-level Definitions object to be deployed

Sensor

A sensor is a way to trigger jobs or assets when an event occurs, such as a file being uploaded or a push notification. In the cases that a job or asset is parameterized, the sensor can also be set with a run configuration (RunConfig) to match.

ConceptRelationship
assetsensor may include a job or selection of assets
configsensor may include a config if the job or assets include a config
jobsensor may include a job or selection of assets
definitionssensor is added to a top-level Definitions object to be deployed