| Type: | Package |
| Title: | Seamless AWS Cloud Bursting for Parallel R Workloads |
| Version: | 0.3.8 |
| Description: | A 'future' backend that enables seamless execution of parallel R workloads on 'Amazon Web Services' ('AWS', https://aws.amazon.com), including 'EC2' and 'Fargate'. 'staRburst' handles environment synchronization, data transfer, quota management, and worker orchestration automatically, allowing users to scale from local execution to 100+ cloud workers with a single line of code change. |
| License: | Apache License 2.0 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Depends: | R (≥ 4.0.0) |
| Imports: | future (≥ 1.33.0), globals, paws.compute, paws.storage, paws.management, paws.security.identity, qs2, uuid, renv, jsonlite, crayon, digest, base64enc, processx |
| Suggests: | future.apply, testthat (≥ 3.0.0), knitr, rmarkdown, withr, mockery |
| VignetteBuilder: | knitr |
| URL: | https://starburst.ing, https://github.com/scttfrdmn/starburst |
| BugReports: | https://github.com/scttfrdmn/starburst/issues |
| NeedsCompilation: | no |
| Packaged: | 2026-03-16 00:17:30 UTC; scttfrdmn |
| Author: | Scott Friedman [aut, cre] |
| Maintainer: | Scott Friedman <help@starburst.ing> |
| Repository: | CRAN |
| Date/Publication: | 2026-03-19 14:50:02 UTC |
starburst: Seamless AWS Cloud Bursting for Parallel R Workloads
Description
A 'future' backend that enables seamless execution of parallel R workloads on 'Amazon Web Services' ('AWS', https://aws.amazon.com), including 'EC2' and 'Fargate'. 'staRburst' handles environment synchronization, data transfer, quota management, and worker orchestration automatically, allowing users to scale from local execution to 100+ cloud workers with a single line of code change.
Author(s)
Maintainer: Scott Friedman help@starburst.ing
See Also
Useful links:
Report bugs at https://github.com/scttfrdmn/starburst/issues
Starburst Future Backend
Description
A future backend for running parallel R workloads on AWS ECS
Usage
StarburstBackend(
workers = 10,
cpu = 4,
memory = "8GB",
region = NULL,
timeout = 3600,
launch_type = "EC2",
instance_type = "c6a.large",
use_spot = FALSE,
warm_pool_timeout = 3600,
...
)
Arguments
workers |
Number of parallel workers |
cpu |
vCPUs per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (supports GB notation, e.g., "8GB") |
region |
AWS region (default: from config or "us-east-1") |
timeout |
Maximum runtime in seconds (default: 3600) |
launch_type |
"EC2" or "FARGATE" (default: "EC2") |
instance_type |
EC2 instance type (e.g., "c6a.large") |
use_spot |
Use spot instances (default: FALSE) |
warm_pool_timeout |
Pool timeout in seconds (default: 3600) |
... |
Additional arguments |
Value
A StarburstBackend object
StarburstFuture Constructor
Description
Creates a Future object for evaluation on AWS Fargate
Usage
StarburstFuture(
expr,
envir = parent.frame(),
substitute = TRUE,
globals = TRUE,
packages = NULL,
lazy = FALSE,
seed = FALSE,
stdout = TRUE,
conditions = "condition",
label = NULL,
...
)
Arguments
expr |
Expression to evaluate |
envir |
Environment for evaluation |
substitute |
Whether to substitute the expression |
globals |
Globals to export (TRUE for auto-detection, list for manual) |
packages |
Packages to load |
lazy |
Whether to lazily evaluate (always FALSE for remote) |
seed |
Random seed |
... |
Additional arguments |
Value
A StarburstFuture object
Atomically claim a pending task
Description
This is a helper that combines get + conditional update in one operation
Usage
atomic_claim_task(session_id, task_id, worker_id, region, bucket)
Arguments
session_id |
Session identifier |
task_id |
Task identifier |
worker_id |
Worker identifier claiming the task |
region |
AWS region |
bucket |
S3 bucket |
Value
TRUE if claimed successfully, FALSE if already claimed
AWS Retry Logic
Description
Centralized retry logic for AWS operations with exponential backoff
Build base Docker image with common dependencies
Description
Build base Docker image with common dependencies
Usage
build_base_image(region)
Build environment image
Description
Build environment image
Usage
build_environment_image(tag, region, use_public = NULL)
Arguments
tag |
Image tag |
region |
AWS region |
use_public |
Logical, use public ECR base image (default NULL = from config) |
Build initial environment
Description
Build initial environment
Usage
build_initial_environment(region)
Calculate Cloud Performance Predictions
Description
Calculate Cloud Performance Predictions
Usage
calculate_predictions(
n_total,
avg_time_per_task,
local_specs,
workers,
cpu,
memory,
platform
)
Calculate task cost
Description
Calculate task cost
Usage
calculate_task_cost(future)
Calculate total cost
Description
Calculate total cost
Usage
calculate_total_cost(plan)
Check and Submit Wave if Ready
Description
Checks wave queue and submits next wave if current wave is complete
Usage
check_and_submit_wave(backend)
Arguments
backend |
Backend environment |
Check AWS credentials
Description
Check AWS credentials
Usage
check_aws_credentials()
Value
Logical indicating if credentials are valid
Check ECR image age and suggest/force rebuild
Description
Check ECR image age and suggest/force rebuild
Usage
check_ecr_image_age(region, image_tag, ttl_days = NULL, force_rebuild = FALSE)
Arguments
region |
AWS region |
image_tag |
Image tag to check |
ttl_days |
TTL setting (NULL = no check) |
force_rebuild |
Force rebuild if past TTL |
Value
TRUE if image is fresh or doesn't exist, FALSE if stale
Check if ECR image exists
Description
Check if ECR image exists
Usage
check_ecr_image_exists(tag, region)
Check Fargate vCPU quota
Description
Check Fargate vCPU quota
Usage
check_fargate_quota(region)
Arguments
region |
AWS region |
Value
List with quota information
Check if quota is sufficient for plan
Description
Check if quota is sufficient for plan
Usage
check_quota_sufficient(workers, cpu, region)
Clean up cluster resources
Description
Clean up cluster resources
Usage
cleanup_cluster(backend)
Cleanup S3 files
Description
Cleanup S3 files
Usage
cleanup_s3_files(plan)
Cleanup session
Description
Cleanup session
Usage
cleanup_session(session, stop_workers = TRUE, force = FALSE)
Arguments
session |
Session object |
stop_workers |
Stop running ECS tasks (default TRUE) |
force |
Delete S3 files (default FALSE) |
Collect results from session
Description
Collect results from session
Usage
collect_session_results(session, wait, timeout)
Compute environment image hash
Description
Computes the hash used to tag environment Docker images, combining the renv.lock file contents with the starburst package version. This ensures new images are built when either the R package environment or the starburst worker script changes.
Usage
compute_env_hash(lock_file)
Arguments
lock_file |
Path to renv.lock file |
Value
MD5 hash string
Get configuration directory
Description
Get configuration directory
Usage
config_dir()
Get configuration file path
Description
Get configuration file path
Usage
config_path()
Create ECR lifecycle policy to auto-delete old images
Description
Create ECR lifecycle policy to auto-delete old images
Usage
create_ecr_lifecycle_policy(region, repository_name, ttl_days = NULL)
Arguments
region |
AWS region |
repository_name |
ECR repository name |
ttl_days |
Number of days to keep images (NULL = no auto-delete) |
Create ECR repository
Description
Create ECR repository
Usage
create_ecr_repository(repo_name, region)
Create ECS cluster
Description
Create ECS cluster
Usage
create_ecs_cluster(cluster_name, region)
Create session manifest in S3
Description
Create session manifest in S3
Usage
create_session_manifest(session_id, backend)
Arguments
session_id |
Unique session identifier |
backend |
Backend configuration |
Value
Invisibly returns NULL
Create session object with methods
Description
Create session object with methods
Usage
create_session_object(backend)
Arguments
backend |
Backend environment |
Value
Session object (environment)
Create S3 bucket for staRburst
Description
Create S3 bucket for staRburst
Usage
create_starburst_bucket(bucket_name, region)
Create task object
Description
Create task object
Usage
create_task(expr, globals, packages, plan)
Create task status in S3
Description
Create task status in S3
Usage
create_task_status(session_id, task_id, state = "pending", region, bucket)
Arguments
session_id |
Session identifier |
task_id |
Task identifier |
state |
Initial state (default: "pending") |
region |
AWS region |
bucket |
S3 bucket |
Value
Invisibly returns NULL
EC2 Pool Management for staRburst
Description
Functions for managing Auto-Scaling Groups and ECS Capacity Providers to maintain warm pools of EC2 instances for fast task execution.
Ensure base image exists
Description
Ensure base image exists
Usage
ensure_base_image(region, use_public = NULL)
Arguments
region |
AWS region |
use_public |
Logical, use public ECR base image (default TRUE) |
Ensure ECS instance IAM profile exists
Description
Ensure ECS instance IAM profile exists
Usage
ensure_ecs_instance_profile(region)
Arguments
region |
AWS region |
Value
Instance profile ARN
Ensure ECS security group exists
Description
Ensure ECS security group exists
Usage
ensure_ecs_security_group(region)
Arguments
region |
AWS region |
Value
Security group ID
Ensure environment is ready
Description
Ensure environment is ready
Usage
ensure_environment(region)
Ensure CloudWatch log group exists
Description
Ensure CloudWatch log group exists
Usage
ensure_log_group(log_group_name, region)
Error Handling for staRburst
Description
Improved error messages with context and solutions
Estimate Cost
Description
Estimate Cost
Estimate cost
Usage
estimate_cost(
workers,
cpu,
memory,
estimated_runtime_hours = 1,
launch_type = "FARGATE",
instance_type = NULL,
use_spot = FALSE
)
estimate_cost(
workers,
cpu,
memory,
estimated_runtime_hours = 1,
launch_type = "FARGATE",
instance_type = NULL,
use_spot = FALSE
)
Extend session timeout
Description
Extend session timeout
Usage
extend_session_timeout(session, seconds)
Extract region from S3 key
Description
Extract region from S3 key
Usage
extract_region_from_key(key)
Create a Future using Starburst Backend
Description
This is the entry point called by the Future package when a plan(starburst) is active
Usage
## S3 method for class 'starburst'
future(
expr,
envir = parent.frame(),
substitute = TRUE,
lazy = FALSE,
seed = FALSE,
globals = TRUE,
packages = NULL,
stdout = TRUE,
conditions = "condition",
label = NULL,
...
)
Arguments
expr |
Expression to evaluate |
envir |
Environment for evaluation |
substitute |
Whether to substitute the expression |
lazy |
Whether to lazily evaluate (always FALSE for remote) |
seed |
Random seed |
globals |
Globals to export (TRUE for auto-detection, list for manual) |
packages |
Packages to load |
stdout |
Whether to capture stdout (TRUE, FALSE, or NA) |
conditions |
Character vector of condition classes to capture |
label |
Optional label for the future |
... |
Additional arguments |
Value
A StarburstFuture object
Get CPU architecture from instance type
Description
Get CPU architecture from instance type
Usage
get_architecture_from_instance_type(instance_type)
Arguments
instance_type |
EC2 instance type (e.g., "c7g.xlarge", "c7i.xlarge", "c7a.xlarge") |
Value
CPU architecture ("ARM64" or "X86_64")
Get Auto Scaling client
Description
Get Auto Scaling client
Usage
get_autoscaling_client(region)
Arguments
region |
AWS region |
Value
Auto Scaling client
Get AWS account ID
Description
Get AWS account ID
Usage
get_aws_account_id()
Value
AWS account ID
Get base image source URI
Description
Get base image source URI
Usage
get_base_image_source(use_public = TRUE)
Arguments
use_public |
Logical, use public ECR base image (default TRUE) |
Get base image URI
Description
Get base image URI
Usage
get_base_image_uri(region)
Get Cloud Performance Ratio
Description
Returns estimated per-core performance of cloud vs local hardware. Based on empirical benchmarking data.
Usage
get_cloud_performance_ratio(local_specs, platform)
Get current vCPU usage
Description
Get current vCPU usage
Usage
get_current_vcpu_usage(region)
Get EC2 client
Description
Get EC2 client
Get EC2 client
Usage
get_ec2_client(region)
get_ec2_client(region)
Arguments
region |
AWS region |
Value
EC2 client
EC2 client
Get EC2 instance pricing
Description
Get EC2 instance pricing
Usage
get_ec2_instance_price(instance_type, use_spot = FALSE)
Arguments
instance_type |
EC2 instance type (e.g., "c7g.xlarge") |
use_spot |
Whether to use spot pricing |
Value
Price per hour in USD
Get ECR client
Description
Get ECR client
Usage
get_ecr_client(region)
Arguments
region |
AWS region |
Value
ECR client
Get ECS client
Description
Get ECS client
Usage
get_ecs_client(region)
Arguments
region |
AWS region |
Value
ECS client
Get ECS-optimized AMI ID for region and architecture
Description
Get ECS-optimized AMI ID for region and architecture
Usage
get_ecs_optimized_ami(region, architecture = "X86_64")
Arguments
region |
AWS region |
architecture |
CPU architecture ("X86_64" or "ARM64") |
Value
AMI ID
Get IAM execution role ARN
Description
Returns the ARN for the ECS execution role (should be created during setup)
Usage
get_execution_role_arn(region)
Get instance specifications (vCPUs, memory)
Description
Get instance specifications (vCPUs, memory)
Usage
get_instance_specs(instance_type)
Get vCPU count for instance type
Description
Get vCPU count for instance type
Usage
get_instance_vcpus(instance_type)
Arguments
instance_type |
EC2 instance type |
Value
Number of vCPUs
Get Local Hardware Specifications
Description
Detects local CPU model and core count for performance estimation.
Usage
get_local_hardware_specs()
Value
List with cpu_name, cores, and architecture
Get or create security group
Description
Get or create security group
Usage
get_or_create_security_group(vpc_id, region)
Get or create subnets
Description
Get or create subnets
Usage
get_or_create_subnets(vpc_id, region)
Get or create task definition
Description
Get or create task definition
Usage
get_or_create_task_definition(plan)
Get pool status
Description
Query current state of the EC2 pool
Usage
get_pool_status(backend)
Arguments
backend |
Backend configuration object |
Value
List with pool status information
Get S3 client
Description
Get S3 client
Usage
get_s3_client(region)
Arguments
region |
AWS region |
Value
S3 client
Get Service Quotas client
Description
Get Service Quotas client
Usage
get_service_quotas_client(region)
Arguments
region |
AWS region |
Value
Service Quotas client
Get session manifest from S3
Description
Get session manifest from S3
Usage
get_session_manifest(session_id, region, bucket)
Arguments
session_id |
Session identifier |
region |
AWS region |
bucket |
S3 bucket |
Value
Session manifest list
Get session status
Description
Get session status
Usage
get_session_status(session)
Get starburst bucket name
Description
Get starburst bucket name
Usage
get_starburst_bucket()
Value
S3 bucket name
Get staRburst configuration
Description
Get staRburst configuration
Usage
get_starburst_config()
Value
List of configuration values
Get starburst security groups
Description
Get starburst security groups
Usage
get_starburst_security_groups(region)
Arguments
region |
AWS region |
Value
Vector of security group IDs
Get starburst subnets
Description
Get starburst subnets
Usage
get_starburst_subnets(region)
Arguments
region |
AWS region |
Value
Vector of subnet IDs
Get task ARN
Description
Get task ARN
Usage
get_task_arn(task_id)
Get task registry environment
Description
Get task registry environment
Usage
get_task_registry()
Get IAM task role ARN
Description
Returns the ARN for the ECS task role (should be created during setup)
Usage
get_task_role_arn(region)
Get task status from S3
Description
Get task status from S3
Usage
get_task_status(session_id, task_id, region, bucket, include_etag = FALSE)
Arguments
session_id |
Session identifier |
task_id |
Task identifier |
region |
AWS region |
bucket |
S3 bucket |
include_etag |
Include ETag in result (for atomic operations) |
Value
Task status list (with optional $etag field)
Get VPC configuration for ECS tasks
Description
Get VPC configuration for ECS tasks
Usage
get_vpc_config(region)
Get wave queue status
Description
Get wave queue status
Usage
get_wave_status(backend)
Arguments
backend |
Backend environment |
Initialize backend for detached mode
Description
Creates a backend for detached sessions without modifying the future plan
Usage
initialize_detached_backend(
session_id,
workers = 10,
cpu = 4,
memory = "8GB",
region = NULL,
timeout = 3600,
absolute_timeout = 86400,
launch_type = "EC2",
instance_type = "c7g.xlarge",
use_spot = TRUE,
warm_pool_timeout = 3600
)
Arguments
session_id |
Unique session identifier |
workers |
Number of workers |
cpu |
vCPUs per worker |
memory |
Memory per worker (GB notation like "8GB") |
region |
AWS region |
timeout |
Task timeout in seconds |
absolute_timeout |
Maximum session lifetime in seconds |
launch_type |
"FARGATE" or "EC2" |
instance_type |
EC2 instance type (for EC2 launch type) |
use_spot |
Use spot instances |
warm_pool_timeout |
Warm pool timeout for EC2 |
Value
Backend environment
Check if setup is complete
Description
Check if setup is complete
Usage
is_setup_complete()
Launch a future on the Starburst backend
Description
Launch a future on the Starburst backend
Usage
## S3 method for class 'StarburstBackend'
launchFuture(backend, future, ...)
Arguments
backend |
A StarburstBackend object |
future |
The future object to launch |
... |
Additional arguments |
Value
The future object (invisibly)
Launch workers for detached session
Description
Launches workers with bootstrap tasks that tell them the session ID
Usage
launch_detached_workers(backend)
Arguments
backend |
Backend environment |
Value
Invisibly returns NULL
List futures for StarburstBackend
Description
List futures for StarburstBackend
Usage
## S3 method for class 'StarburstBackend'
listFutures(backend, ...)
Arguments
backend |
A StarburstBackend object |
... |
Additional arguments |
Value
List of futures (empty for this backend)
List active clusters
Description
List active clusters
Usage
list_active_clusters(region)
List pending tasks in session
Description
List pending tasks in session
Usage
list_pending_tasks(session_id, region, bucket)
Arguments
session_id |
Session identifier |
region |
AWS region |
bucket |
S3 bucket |
Value
Character vector of pending task IDs
List all stored task ARNs
Description
List all stored task ARNs
Usage
list_task_arns()
List all task statuses in session
Description
List all task statuses in session
Usage
list_task_statuses(session_id, region, bucket)
Arguments
session_id |
Session identifier |
region |
AWS region |
bucket |
S3 bucket |
Value
Named list of task statuses (task_id -> status)
Load quota request history
Description
Load quota request history
Usage
load_quota_history()
Number of workers for StarburstBackend
Description
Number of workers for StarburstBackend
Usage
## S3 method for class 'StarburstBackend'
nbrOfWorkers(evaluator)
Arguments
evaluator |
A StarburstBackend object |
Value
Number of workers
Package installation error
Description
Package installation error
Usage
package_installation_error(package, error_message = NULL)
Parse memory specification
Description
Parse memory specification
Usage
parse_memory(memory)
Arguments
memory |
Memory specification (numeric GB or string like "8GB") |
Permission denied error
Description
Permission denied error
Usage
permission_error(service, operation, resource = NULL, iam_role = NULL)
staRburst Future Backend
Description
A future backend for running parallel R workloads on AWS (EC2 or Fargate)
Usage
## S3 method for class 'starburst'
plan(
strategy,
workers = 10,
cpu = 4,
memory = "8GB",
region = NULL,
timeout = 3600,
auto_quota_request = interactive(),
launch_type = "EC2",
instance_type = "c7g.xlarge",
use_spot = TRUE,
warm_pool_timeout = 3600,
detached = FALSE,
...
)
Arguments
strategy |
The starburst strategy marker (ignored, for S3 dispatch) |
workers |
Number of parallel workers |
cpu |
vCPUs per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (supports GB notation, e.g., "8GB") |
region |
AWS region (default: from config or "us-east-1") |
timeout |
Maximum runtime in seconds (default: 3600) |
auto_quota_request |
Automatically request quota increases (default: interactive()) |
launch_type |
Launch type: EC2 or FARGATE (default: EC2) |
instance_type |
EC2 instance type when using EC2 launch type (default: c7g.xlarge) |
use_spot |
Use EC2 Spot instances for cost savings (default: TRUE) |
warm_pool_timeout |
Timeout for warm pool in seconds (default: 3600) |
detached |
Use detached session mode (deprecated, use starburst_session instead) |
... |
Additional arguments passed to future backend |
Value
A future plan object
Examples
if (starburst_is_configured()) {
future::plan(starburst, workers = 50)
results <- future.apply::future_lapply(1:100, function(i) i^2)
}
Poll for result
Description
Poll for result
Usage
poll_for_result(future, timeout = 3600)
Print method for session status
Description
Print method for session status
Usage
## S3 method for class 'StarburstSessionStatus'
print(x, ...)
Arguments
x |
A StarburstSessionStatus object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x.
Print Estimate Summary
Description
Print Estimate Summary
Usage
print_estimate_summary(pred, local_specs)
Quota exceeded error
Description
Quota exceeded error
Usage
quota_error(resource, requested, available, region, workers = NULL, cpu = NULL)
Reconstruct backend from session manifest
Description
Used when reattaching to an existing session
Usage
reconstruct_backend_from_manifest(manifest)
Arguments
manifest |
Session manifest from S3 |
Value
Backend environment
Register cleanup handler
Description
Register cleanup handler
Usage
register_cleanup(evaluator)
Request quota increase
Description
Request quota increase
Usage
request_quota_increase(service, quota_code, desired_value, region, reason = "")
Arguments
service |
AWS service (e.g., "fargate") |
quota_code |
Service quota code |
desired_value |
Desired quota value |
region |
AWS region |
reason |
Justification for increase |
Value
Case ID if successful, NULL if failed
Check if StarburstFuture is Resolved
Description
Checks whether the future task has completed execution
Usage
## S3 method for class 'StarburstFuture'
resolved(x, ...)
Arguments
x |
A StarburstFuture object |
... |
Additional arguments |
Value
Logical indicating if the future is resolved
Get Result from StarburstFuture
Description
Retrieves the result from a resolved future
Usage
## S3 method for class 'StarburstFuture'
result(future, ...)
Arguments
future |
A StarburstFuture object |
... |
Additional arguments |
Value
A FutureResult object
Check if result exists
Description
Check if result exists
Usage
result_exists(task_id, region)
Run a StarburstFuture
Description
Submits the future task to AWS Fargate for execution
Usage
## S3 method for class 'StarburstFuture'
run(future, ...)
Arguments
future |
A StarburstFuture object |
... |
Additional arguments |
Value
The future object (invisibly)
Execute system command safely (no shell injection)
Description
Execute system command safely (no shell injection)
Usage
safe_system(
command,
args = character(),
allowed_commands = c("docker", "aws", "uname", "sysctl", "cat", "nproc"),
stdin = NULL,
stdout = "|",
stderr = "|",
...
)
Arguments
command |
Command to execute (must be in whitelist) |
args |
Character vector of arguments |
allowed_commands |
Commands allowed to be executed |
stdin |
Optional input to pass to stdin |
... |
Additional arguments passed to processx::run() |
Value
Result from processx::run()
Save staRburst configuration
Description
Save staRburst configuration
Usage
save_config(config)
Save quota request to local history
Description
Save quota request to local history
Usage
save_quota_request(case_id, service, quota_code, desired_value, region)
Serialize and upload to S3
Description
Serialize and upload to S3
Usage
serialize_and_upload(obj, bucket, key)
Detached Session API
Description
User-facing API for creating and managing detached sessions
Session Backend Initialization
Description
Backend initialization for detached session mode
S3 Session State Management
Description
Core S3 operations for managing detached session state
Setup EC2 capacity provider for ECS cluster
Description
Creates Launch Template, Auto-Scaling Group, and ECS Capacity Provider
Usage
setup_ec2_capacity_provider(backend)
Arguments
backend |
Backend configuration object |
Value
List with capacity provider details
Setup VPC resources
Description
Setup VPC resources
Usage
setup_vpc_resources(region)
Starburst strategy marker
Description
This function should never be called directly. Use plan(starburst, ...) instead.
Usage
starburst(...)
Arguments
... |
Arguments passed to StarburstBackend() |
Value
Does not return a value; always signals an error if called directly.
This object exists as a strategy marker for plan.
Monitor quota increase request
Description
Monitor quota increase request
Usage
starburst_check_quota_request(case_id, region = NULL)
Arguments
case_id |
Case ID from quota increase request |
region |
AWS region |
Value
Invisibly returns the quota request details, or NULL on error.
Examples
if (starburst_is_configured()) {
starburst_check_quota_request("case-12345")
}
Clean up staRburst ECR images
Description
Manually delete Docker images from ECR to save storage costs. Images will be rebuilt on next use (adds 3-5 min delay).
Usage
starburst_cleanup_ecr(force = FALSE, region = NULL)
Arguments
force |
Delete all images immediately, ignoring TTL |
region |
AWS region (default: from config) |
Value
Invisibly returns TRUE on success or FALSE if not
configured.
Examples
if (starburst_is_configured()) {
# Delete images past TTL
starburst_cleanup_ecr()
# Delete all images immediately (save $0.50/month)
starburst_cleanup_ecr(force = TRUE)
}
Create a Starburst Cluster
Description
Creates a cluster object for managing AWS Fargate workers using Future backend
Usage
starburst_cluster(
workers = 10,
cpu = 4,
memory = "8GB",
platform = "X86_64",
region = NULL,
timeout = 3600
)
Arguments
workers |
Number of parallel workers |
cpu |
CPU units per worker |
memory |
Memory per worker |
platform |
CPU architecture (X86_64 or ARM64) |
region |
AWS region |
timeout |
Maximum runtime in seconds |
Value
A starburst_cluster object
Examples
if (starburst_is_configured()) {
cluster <- starburst_cluster(workers = 20)
results <- cluster$map(data, function(x) x * 2)
}
Execute Map on Starburst Cluster
Description
Internal function to execute parallel map by creating StarburstFuture objects directly
Usage
starburst_cluster_map(cluster, .x, .f, .progress = TRUE)
Configure staRburst options
Description
Configure staRburst options
Usage
starburst_config(
max_cost_per_job = NULL,
cost_alert_threshold = NULL,
auto_cleanup_s3 = NULL,
...
)
Arguments
max_cost_per_job |
Maximum cost per job in dollars |
cost_alert_threshold |
Cost threshold for alerts |
auto_cleanup_s3 |
Automatically clean up S3 files after completion |
... |
Additional configuration options |
Value
Invisibly returns the updated configuration list.
Examples
if (starburst_is_configured()) {
starburst_config(
max_cost_per_job = 10,
cost_alert_threshold = 5
)
}
Create informative staRburst error
Description
Creates error messages with context, solutions, and links to documentation
Usage
starburst_error(
message,
context = list(),
solution = NULL,
call = sys.call(-1)
)
Arguments
message |
Main error message |
context |
Named list of contextual information |
solution |
Suggested solution (optional) |
call |
Calling function (default: sys.call(-1)) |
Value
Error condition with class "starburst_error"
Estimate Cloud Performance and Cost
Description
Runs a small sample of tasks locally to estimate cloud execution time and cost. Provides informed prediction before spending money on cloud execution.
Usage
starburst_estimate(
.x,
.f,
workers = 10,
cpu = 2,
memory = "8GB",
platform = "X86_64",
sample_size = 10,
region = NULL,
...
)
Arguments
.x |
A vector or list to iterate over |
.f |
A function to apply to each element |
workers |
Number of parallel workers to estimate for |
cpu |
CPU units per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (e.g., "8GB") |
platform |
CPU architecture: "X86_64" (default) or "ARM64" (Graviton3) |
sample_size |
Number of items to run locally for estimation (default: 10) |
region |
AWS region |
... |
Additional arguments passed to .f |
Value
Invisible list with estimates, prints summary to console
Examples
if (starburst_is_configured()) {
# Estimate before running
starburst_estimate(1:1000, expensive_function, workers = 50)
# Then decide whether to proceed
results <- starburst_map(1:1000, expensive_function, workers = 50)
}
Check if staRburst is configured
Description
Returns TRUE if starburst_setup() has been run, the
configuration file exists, and AWS credentials are available.
Useful for guarding example code that requires AWS credentials.
Usage
starburst_is_configured()
Value
TRUE if configured and credentials are available, FALSE otherwise.
Examples
starburst_is_configured()
List All Sessions
Description
List all detached sessions in S3
Usage
starburst_list_sessions(region = NULL)
Arguments
region |
AWS region (default: from config) |
Value
Data frame with session information
Examples
if (starburst_is_configured()) {
sessions <- starburst_list_sessions()
print(sessions)
}
View worker logs
Description
View worker logs
Usage
starburst_logs(task_id = NULL, cluster_id = NULL, last_n = 50, region = NULL)
Arguments
task_id |
Optional task ID to view logs for specific task |
cluster_id |
Optional cluster ID to view logs for specific cluster |
last_n |
Number of last log lines to show (default: 50) |
region |
AWS region (default: from config) |
Value
Invisibly returns the list of log events, or NULL if no
events were found.
Examples
if (starburst_is_configured()) {
# View recent logs
starburst_logs()
# View logs for specific task
starburst_logs(task_id = "abc-123")
# View last 100 lines
starburst_logs(last_n = 100)
}
Map Function Over Data Using AWS Fargate
Description
Parallel map function that executes on AWS Fargate using the Future backend
Usage
starburst_map(
.x,
.f,
workers = 10,
cpu = 4,
memory = "8GB",
platform = "X86_64",
region = NULL,
timeout = 3600,
.progress = TRUE,
...
)
Arguments
.x |
A vector or list to iterate over |
.f |
A function to apply to each element |
workers |
Number of parallel workers (default: 10) |
cpu |
CPU units per worker (1, 2, 4, 8, or 16) |
memory |
Memory per worker (e.g., 8GB) |
platform |
CPU architecture (X86_64 or ARM64) |
region |
AWS region |
timeout |
Maximum runtime in seconds per task |
.progress |
Show progress bar (default: TRUE) |
... |
Additional arguments passed to .f |
Value
A list of results, one per element of .x
Examples
if (starburst_is_configured()) {
# Simple parallel computation
results <- starburst_map(1:100, function(x) x^2, workers = 10)
# With custom configuration
results <- starburst_map(
data_list,
expensive_function,
workers = 50,
cpu = 4,
memory = "8GB"
)
}
Show quota status
Description
Show quota status
Usage
starburst_quota_status(region = NULL)
Arguments
region |
AWS region (default: from config) |
Value
Invisibly returns a list with quota information including current limit, usage, and any pending requests.
Examples
if (starburst_is_configured()) {
starburst_quota_status()
}
Rebuild environment image
Description
Rebuild environment image
Usage
starburst_rebuild_environment(region = NULL, force = FALSE)
Arguments
region |
AWS region (default: from config) |
force |
Force rebuild even if current environment hasn't changed |
Value
Invisibly returns NULL. Called for its side effect of
rebuilding and pushing the Docker environment image.
Examples
if (starburst_is_configured()) {
starburst_rebuild_environment()
}
Request quota increase (user-facing)
Description
Request quota increase (user-facing)
Usage
starburst_request_quota_increase(vcpus = 500, region = NULL)
Arguments
vcpus |
Desired vCPU quota |
region |
AWS region (default: from config) |
Value
Invisibly returns TRUE if the increase was requested,
FALSE if already sufficient or cancelled.
Examples
if (starburst_is_configured()) {
starburst_request_quota_increase(vcpus = 500)
}
Create a Detached Starburst Session
Description
Creates a new detached session that can run computations independently of your R session. You can close R and reattach later to collect results.
Usage
starburst_session(
workers = 10,
cpu = 4,
memory = "8GB",
region = NULL,
timeout = 3600,
session_timeout = 3600,
absolute_timeout = 86400,
launch_type = "EC2",
instance_type = "c7g.xlarge",
use_spot = TRUE,
warm_pool_timeout = 3600
)
Arguments
workers |
Number of parallel workers (default: 10) |
cpu |
vCPUs per worker (default: 4) |
memory |
Memory per worker, e.g., "8GB" (default: "8GB") |
region |
AWS region (default: from config or "us-east-1") |
timeout |
Task timeout in seconds (default: 3600) |
session_timeout |
Active timeout in seconds (default: 3600) |
absolute_timeout |
Maximum session lifetime in seconds (default: 86400) |
launch_type |
"FARGATE" or "EC2" (default: "FARGATE") |
instance_type |
EC2 instance type for EC2 launch (default: "c6a.large") |
use_spot |
Use spot instances for EC2 (default: FALSE) |
warm_pool_timeout |
EC2 warm pool timeout in seconds (default: 3600) |
Value
A StarburstSession object with methods:
-
submit(expr, ...)- Submit a task to the session -
status()- Get progress summary -
collect(wait = FALSE)- Collect completed results -
extend(seconds = 3600)- Extend timeout -
cleanup()- Terminate and cleanup
Examples
if (starburst_is_configured()) {
# Create detached session
session <- starburst_session(workers = 10)
# Submit tasks
task_ids <- lapply(1:100, function(i) {
session$submit(quote(expensive_computation(i)))
})
# Close R and come back later...
session_id <- session$session_id
# Reattach
session <- starburst_session_attach(session_id)
# Collect results
results <- session$collect(wait = TRUE)
}
Reattach to Existing Session
Description
Reattach to a previously created detached session
Usage
starburst_session_attach(session_id, region = NULL)
Arguments
session_id |
Session identifier |
region |
AWS region (default: from config) |
Value
A StarburstSession object
Examples
if (starburst_is_configured()) {
session <- starburst_session_attach("session-abc123")
status <- session$status()
results <- session$collect()
}
Setup staRburst
Description
One-time configuration to set up AWS resources for staRburst
Usage
starburst_setup(
region = "us-east-1",
force = FALSE,
use_public_base = TRUE,
ecr_image_ttl_days = NULL
)
Arguments
region |
AWS region (default: "us-east-1") |
force |
Force re-setup even if already configured |
use_public_base |
Use public base Docker images (default: TRUE). Set to FALSE to build private base images in your ECR. |
ecr_image_ttl_days |
Number of days to keep Docker images in ECR (default: NULL = never delete). AWS will automatically delete images older than this many days. This prevents surprise costs if you stop using staRburst. Recommended: 30 days for regular users, 7 days for occasional users. When images are deleted, they will be rebuilt on next use (adds 3-5 min). |
Value
Invisibly returns the configuration list.
Examples
if (starburst_is_configured()) {
# Default: keep images forever (~$0.50/month idle cost)
starburst_setup()
# Auto-delete images after 30 days (saves money if you stop using it)
starburst_setup(ecr_image_ttl_days = 30)
# Use private base images with 7-day cleanup
starburst_setup(use_public_base = FALSE, ecr_image_ttl_days = 7)
}
Setup EC2 capacity providers for staRburst
Description
One-time setup for EC2 launch type. Creates IAM roles, instance profiles, and capacity providers for specified instance types.
Usage
starburst_setup_ec2(
region = "us-east-1",
instance_types = c("c7g.xlarge", "c7i.xlarge"),
force = FALSE
)
Arguments
region |
AWS region (default: "us-east-1") |
instance_types |
Character vector of instance types to setup (default: c("c7g.xlarge", "c7i.xlarge")) |
force |
Force re-setup even if already configured |
Value
Invisibly returns TRUE on success or FALSE on failure
or cancellation.
Examples
if (starburst_is_configured()) {
# Setup with default instance types (Graviton and Intel)
starburst_setup_ec2()
# Setup with custom instance types
starburst_setup_ec2(instance_types = c("c7g.2xlarge", "r7g.xlarge"))
}
Show staRburst status
Description
Show staRburst status
Usage
starburst_status()
Value
Invisibly returns a list with current configuration and quota information.
Start warm EC2 pool
Description
Scales Auto-Scaling Group to desired capacity and waits for instances
Usage
start_warm_pool(backend, capacity, timeout_seconds = 180)
Arguments
backend |
Backend configuration object |
capacity |
Desired number of instances |
timeout_seconds |
Maximum time to wait for instances (default: 180) |
Value
Invisible NULL
Stop running tasks
Description
Stop running tasks
Usage
stop_running_tasks(plan)
Stop warm pool
Description
Scales Auto-Scaling Group to zero
Usage
stop_warm_pool(backend)
Arguments
backend |
Backend configuration object |
Value
Invisible NULL
Store task ARN
Description
Store task ARN
Usage
store_task_arn(task_id, task_arn)
Submit detached worker to ECS
Description
Submit detached worker to ECS
Usage
submit_detached_worker(backend, task_id)
Arguments
backend |
Backend environment |
task_id |
Task ID for the worker to execute |
Value
Task ARN
Submit Task to ECS
Description
Internal function to submit a task to ECS (Fargate or EC2)
Usage
submit_task(future, backend)
Arguments
future |
StarburstFuture object |
backend |
Backend/plan object |
Submit task to session
Description
Submit task to session
Usage
submit_to_session(session, expr, envir, substitute, globals, packages)
Suggest appropriate quota based on needs
Description
Suggest appropriate quota based on needs
Usage
suggest_quota(vcpus_needed)
Arguments
vcpus_needed |
vCPUs needed |
Value
Suggested quota value
Task failure error
Description
Task failure error
Usage
task_failure_error(task_id, reason = NULL, exit_code = NULL, log_stream = NULL)
Update session manifest atomically
Description
Uses S3 ETags for optimistic locking to prevent race conditions.
Usage
update_session_manifest(session_id, updates, region, bucket, max_retries = 3)
Arguments
session_id |
Session identifier |
updates |
Named list of fields to update |
region |
AWS region |
bucket |
S3 bucket |
max_retries |
Maximum number of retry attempts (default: 3) |
Value
Invisibly returns updated manifest
Update task status with atomic write
Description
Update task status with atomic write
Usage
update_task_status(
session_id,
task_id,
state,
etag = NULL,
region,
bucket,
updates = list()
)
Arguments
session_id |
Session identifier |
task_id |
Task identifier |
state |
New state |
etag |
Optional ETag for conditional write (atomic claiming) |
region |
AWS region |
bucket |
S3 bucket |
updates |
Optional additional fields to update |
Value
TRUE if successful, FALSE if conditional write failed
Upload task for detached session
Description
Upload task for detached session
Usage
upload_detached_task(task_id, task_data, backend)
Arguments
task_id |
Task identifier |
task_data |
Task data (expr, globals, packages, session_id) |
backend |
Backend environment |
Value
Invisibly returns NULL
Utility functions for staRburst
Description
Utility functions for staRburst
Retry AWS operations with exponential backoff
Description
Wraps AWS API calls with automatic retry logic for transient failures. Uses exponential backoff with jitter to avoid thundering herd.
Usage
with_aws_retry(
expr,
max_attempts = 3,
base_delay = 1,
max_delay = 60,
retryable_errors = c("Throttling", "ThrottlingException", "RequestTimeout",
"ServiceUnavailable", "InternalError", "InternalServerError", "TooManyRequests",
"RequestLimitExceeded", "5\\d{2}"),
operation_name = "AWS operation"
)
Arguments
expr |
Expression to evaluate (AWS API call) |
max_attempts |
Maximum retry attempts (default: 3) |
base_delay |
Initial delay in seconds (default: 1) |
max_delay |
Maximum delay in seconds (default: 60) |
retryable_errors |
Regex patterns for retryable error messages |
operation_name |
Optional name for logging (default: "AWS operation") |
Value
Result of expression
Retry ECR operations
Description
Specialized wrapper for ECR operations with ECR-specific retry patterns
Usage
with_ecr_retry(expr, max_attempts = 3, operation_name = "ECR operation")
Arguments
expr |
Expression to evaluate (AWS API call) |
max_attempts |
Maximum retry attempts (default: 3) |
operation_name |
Optional name for logging (default: "AWS operation") |
Retry ECS operations
Description
Specialized wrapper for ECS operations with ECS-specific retry patterns
Usage
with_ecs_retry(expr, max_attempts = 3, operation_name = "ECS operation")
Arguments
expr |
Expression to evaluate (AWS API call) |
max_attempts |
Maximum retry attempts (default: 3) |
operation_name |
Optional name for logging (default: "AWS operation") |
Retry S3 operations
Description
Specialized wrapper for S3 operations with S3-specific retry patterns
Usage
with_s3_retry(expr, max_attempts = 3, operation_name = "S3 operation")
Arguments
expr |
Expression to evaluate (AWS API call) |
max_attempts |
Maximum retry attempts (default: 3) |
operation_name |
Optional name for logging (default: "AWS operation") |
Worker validation error
Description
Worker validation error
Usage
worker_validation_error(workers, max_allowed = 500)