Data Types

Data types are used in configuration files to help validate data and ensure your Cortex deployment is functioning as expected.
Raw Column Types
These are the valid types for raw columns:
INT_COLUMN
FLOAT_COLUMN
STRING_COLUMN
Transformed Column Types
These are the valid types for transformed columns (i.e. output types of transformers):
INT_COLUMN
FLOAT_COLUMN
STRING_COLUMN
INT_LIST_COLUMN
FLOAT_LIST_COLUMN
STRING_LIST_COLUMN
Output Types
Output types are used to define the output types of aggregators and constants. There are four base scalar types:
INT
FLOAT
STRING
BOOL
In addition, an output type may be a list of scalar types. This is denoted by a length-one list of any of the supported types. For example, [INT] represents a list of integers.
An output type may also be a map containing these types. There are two types of maps: generic maps and fixed maps. Generic maps represent maps which may have any number of items. The types of the keys and values must match the declared types. For example, {STRING: INT} supports {"San Francisco": -7, "Toronto": -4}. Fixed maps represent maps which must define values for each of the pre-defined keys. For example: {mean: INT, stddev: FLOAT} supports {mean: 17, stddev: 8.8}.
The values in lists, generic maps, and fixed maps may be arbitrarily nested.
Here are some valid output types:
INT
[STRING]
{STRING: INT}
[{STRING: INT}]
{value1: INT, value2: FLOAT}
{STRING: {value1: INT, value2: FLOAT}}
{value1: {STRING: BOOL}, value2: [FLOAT], value2: STRING}
Example
Output type:
output_type:
  value1: BOOL
  value2: INT|FLOAT
  value3: [STRING]
  value4: {INT: STRING}
Output value:
output_type:
  value1: True
  value2: 2.2
  value3: [test1, test2, test3]
  value4: {1: test1, 2: test2}
Input Types
Input types are used to define the inputs to aggregators, transformers, and estimators. Typically, input types can be any combination of the column or scalar types:
INT_COLUMN
FLOAT_COLUMN
STRING_COLUMN
INT_LIST_COLUMN
FLOAT_LIST_COLUMN
STRING_LIST_COLUMN
INT
FLOAT
STRING
BOOL
Like with output types, input types may occur within arbitrarily nested lists, generic maps, and fixed maps.
Ambiguous input types are also supported, and are represented by joining types with |. For example, INT_COLUMN|FLOAT_COLUMN indicates that either a column of type INT_COLUMN or a column of type FLOAT_COLUMN may be used as the input. Any two or more types may be combined in this way (e.g. INT|FLOAT|STRING is supported). All permutations of ambiguous types are valid (e.g. INT|FLOAT and FLOAT|INT are equivalent). Column types and scalar types may not be combined (e.g. INT|FLOAT_COLUMN is not valid).
Input Type Validations
By default, all declared inputs are required. For example, if the input type is {value1: INT, value2: FLOAT}, both value1 and value2 must be provided (and cannot be Null). With Cortex, it is possible to declare inputs as optional, set default values, allow values to be Null, and specify minimum and maximum map/list lengths.
To specify validation options, the "long form" input schema is used. In the long form, the input type is always a map, with the _type key specifying the type, and other keys (which all start with _) specifying the options. The available options are:
_optional: If set to True, allows the value to be missing from the input. This only applies to values in maps.
_default: Specifies a default value to use if the value is missing from the input. This only applies to values in maps. Setting _defaut implies _optional: True.
_allow_null: If set to True, allows the value to be explicitly set to Null.
_min_count: Specifies the minimum number of elements that must be in the list or map.
_max_count: Specifies the maximum number of elements that must be in the list or map.
Example
Short form input type:
input:
  value1: INT_COLUMN
  value2: INT|FLOAT
  value3: [STRING]
  value4: {INT: STRING}
Long form input type:
input:
  value1:
    _type: INT_COLUMN
    _optional: True
  value2:
    _type: INT|FLOAT
    _default: 2.2
    _allow_null: True
  value3:
    _type: [STRING]
    _min_count: 1
  value4:
    _type: {INT: STRING}
    _min_count: 1
    _max_count: 100
Input value (assuming column1 is an INT_COLUMN, constant1 is a [STRING], and aggregate1 is an STRING):
input:
  value1: @column1
  value2: 2.2
  value3: @constant1
  value4: {1: test1, 2: @aggregate1}
Pipeline Deployments - Previous
Constants
Next - Pipeline Deployments
Templates
Last updated -2