Aggregate columns at scale.
- kind: aggregatename: <string> # aggregate name (required)aggregator: <string> # the name of the aggregator to use (this or aggregator_path must be specified)aggregator_path: <string> # a path to an aggregator implementation file (this or aggregator must be specified)input: <input_value> # the input to the aggregator, which may contain references to columns and constants (e.g. @column1) (required)compute:executors: <int> # number of spark executors (default: 1)driver_cpu: <string> # CPU request for spark driver (default: 1)driver_mem: <string> # memory request for spark driver (default: 500Mi)driver_mem_overhead: <string> # off-heap (non-JVM) memory allocated to the driver (overrides mem_overhead_factor) (default: min[driver_mem * 0.4, 384Mi])executor_cpu: <string> # CPU request for each spark executor (default: 1)executor_mem: <string> # memory request for each spark executor (default: 500Mi)executor_mem_overhead: <string> # off-heap (non-JVM) memory allocated to each executor (overrides mem_overhead_factor) (default: min[executor_mem * 0.4, 384Mi])mem_overhead_factor: <float> # the proportion of driver_mem/executor_mem which will be additionally allocated for off-heap (non-JVM) memory (default: 0.4)
See Data Types for details about input values. Note: the input of the the aggregate must match the input type of the aggregator (if specified).
See aggregators.yaml for a list of built-in aggregators.
- kind: aggregatename: age_bucket_boundariesaggregator: cortex.bucket_boundariesinput:col: @age # "age" is the name of a numeric raw columnnum_buckets: 5​- kind: aggregatename: price_bucket_boundariesaggregator: cortex.bucket_boundariesinput:col: @price # "price" is the name of a numeric raw columnnum_buckets: @num_buckets # "num_buckets" is the name of an INT constant