Fixed issue where pipelines with a dependency would run on a time schedule even if the upstream pipeline didn’t run (and vice versa).
Fixed output of next scheduled pipelines to better reflect DAG structures.
Directed acyclic graph (DAG) pipelines - where the output of one
pipeline can feed into another - are now available using the
maestroOutputs
and maestroInputs
tags.
Pipelines that input into a downstream pipeline should use the
maestroOutputs
tag. Pipelines that receive input from an
upstream pipeline should use the maestroInputs
tag
(#98).
New function show_network
for visualizing the
connections between pipelines that are connected in a DAG.
MaestroSchedule
gains new methods
get_network()
(returns a data.frame) and
show_network()
(returns a visualization using
{DiagrammeR}).
Added catch-all maestro
tag to identify a function
as a pipeline without specifying other configurations.
create_pipeline
to allow for
interactive creation of pipelines that default to skip.Fixed issue with suggest_orch_frequency
when using
different styles of frequency (e.g., 1 day vs. daily) in a single
schedule.
Fixed issue where pipeline sourcing failures were appearing as successful runs in status outputs.
This version refactors much of the code base to rely on R6 classes for pipelines and schedules. Pay careful attention to the breaking changes to see how existing code may be impacted.
Schedules are now represented as an R6 object of class
<MaestroSchedule>
. build_schedule()
returns a MaestroSchedule object that can be passed to
run_schedule()
as normal. To access the schedule table run
get_schedule()
.
run_schedule()
no longer returns a list of
$status
and $artifacts
but now
returns/modifies the MaestroSchedule object. Status can be accessed
using get_status(schedule)
and artifacts via
get_artifacts(schedule)
suggest_orch_frequency()
now takes a
<MaestroSchedule>
object.
Data example_schedule
removed from the
package.
Skipped pipelines are no longer shown in the CLI output of
run_schedule()
.
It is now required that all pipeline names are unique. The names
of each maestro pipeline function must be unique across the project to
support the implementation of DAGs. build_schedule()
will
abort if any non-unique names are detected.
Added functions get_schedule()
,
get_status()
, and get_artifacts()
for
interacting with <MaestroSchedule>
objects.
Added function invoke()
to instantly run a pipeline
in a schedule.
New tags maestroHours
, maestroDays
, and
maestroMonths
allows running of pipelines on specific hours
of day, days of week, days of month, or months of year (#100).
maestroFrequency
tag now accepts the values hourly,
daily, weekly, biweekly, monthly, quarterly, and yearly. Argument
orch_frequency
to run_schedule()
also accepts
these values.
Changed from example_schedule
data the pipeline with
a schedule of 1 minute to 30 minutes in keeping with best practices for
minimum pipeline frequency.
suggest_orch_frequency
now uses the smallest
interval between any two pipelines (#99).
Error messages on unintentional overwrites from
create_*()
functions correctly reference name of path or
directory that was to be overwritten.
Fixed cli output of run_schedule()
to not show
skipped pipelines in the next run portion.
Fixed cli output to correctly handle counting of successful runs when pipelines are skipped.
Performance improvements to build_schedule()
(#101).
Creater functions create_pipeline()
and
create_maestro
no longer have default arguments for the
path to where the scripts are created. Users must explicitly define
these paths.
Argument log_file
in run_schedule()
no
longer defaults to ./maestro.log
but instead defaults to
NULL
.
create_*
now take a boolean
overwrite
argument to make the overwriting of existing
pipelines, projects, and orchestrators more explicit.run_schedule()
now returns a list with status and
artifacts instead of just a data.frame of the status. Artifacts are any
values returned from pipelines. Pipelines that return nothing will have
no artifacts.suggest_orch_frequency()
to provide
a suggestion of what frequency to use for the orchestrator.run_schedule()
now correctly outputs
the total number of pipelines (#81) and correctly outputs number of
errors.maestroFrequency tag now adheres to a more human-readable format like “1 day”, “2 hours”, “4 weeks”, etc.
orch_frequency
argument in
run_schedule()
also takes more human-readable format
identical to maestroFrequency tag.
maestroInterval tag removed
orch_interval
argument to
run_schedule()
removed.
create_maestro()
and
create_orchestrator()
now use the argument
type
instead of extension
for defining what
script type to use for the orchestrator.
Changed last_parsing_errors()
to
last_build_errors()
; changed functions of the form
last_runtime_*()
to last_run_*()
.
Additional columns added to the output of
run_schedule()
: pipeline_started
and
pipeline_ended
to indicate the start and end times of a
pipeline execution; next_run
to indicate when the next run
should be based on the frequency of the pipeline and
orchestrator.
Pipelines now show as skipped if they are not scheduled.
Added hex logo
Backend improvements to schedule checking
Timestamps are formatted to specified time zone.
run_schedule()
cli output suggests to use
last_run_errors()
or last_run_warnings()
if
any errors or warnings were found.