Data types are used in configuration files to help validate data and ensure your Cortex deployment is functioning as expected.
These are the valid types for raw columns:
INT_COLUMN
FLOAT_COLUMN
STRING_COLUMN
These are the valid types for transformed columns (i.e. output types of transformers):
INT_COLUMN
FLOAT_COLUMN
STRING_COLUMN
INT_LIST_COLUMN
FLOAT_LIST_COLUMN
STRING_LIST_COLUMN
Output types are used to define the output types of aggregators and constants. There are four base scalar types:
INT
FLOAT
STRING
BOOL
In addition, an output type may be a list of scalar types. This is denoted by a length-one list of any of the supported types. For example, [INT] represents a list of integers.
An output type may also be a map containing these types. There are two types of maps: generic maps and fixed maps. Generic maps represent maps which may have any number of items. The types of the keys and values must match the declared types. For example, {STRING: INT} supports {"San Francisco": -7, "Toronto": -4}. Fixed maps represent maps which must define values for each of the pre-defined keys. For example: {mean: INT, stddev: FLOAT} supports {mean: 17, stddev: 8.8}.
The values in lists, generic maps, and fixed maps may be arbitrarily nested.
Here are some valid output types:
INT
[STRING]
{STRING: INT}
[{STRING: INT}]
{value1: INT, value2: FLOAT}
{STRING: {value1: INT, value2: FLOAT}}
{value1: {STRING: BOOL}, value2: [FLOAT], value2: STRING}
Output type:
output_type:value1: BOOLvalue2: INT|FLOATvalue3: [STRING]value4: {INT: STRING}
Output value:
output_type:value1: Truevalue2: 2.2value3: [test1, test2, test3]value4: {1: test1, 2: test2}
Input types are used to define the inputs to aggregators, transformers, and estimators. Typically, input types can be any combination of the column or scalar types:
INT_COLUMN
FLOAT_COLUMN
STRING_COLUMN
INT_LIST_COLUMN
FLOAT_LIST_COLUMN
STRING_LIST_COLUMN
INT
FLOAT
STRING
BOOL
Like with output types, input types may occur within arbitrarily nested lists, generic maps, and fixed maps.
Ambiguous input types are also supported, and are represented by joining types with |. For example, INT_COLUMN|FLOAT_COLUMN indicates that either a column of type INT_COLUMN or a column of type FLOAT_COLUMN may be used as the input. Any two or more types may be combined in this way (e.g. INT|FLOAT|STRING is supported). All permutations of ambiguous types are valid (e.g. INT|FLOAT and FLOAT|INT are equivalent). Column types and scalar types may not be combined (e.g. INT|FLOAT_COLUMN is not valid).
By default, all declared inputs are required. For example, if the input type is {value1: INT, value2: FLOAT}, both value1 and value2 must be provided (and cannot be Null). With Cortex, it is possible to declare inputs as optional, set default values, allow values to be Null, and specify minimum and maximum map/list lengths.
To specify validation options, the "long form" input schema is used. In the long form, the input type is always a map, with the _type key specifying the type, and other keys (which all start with _) specifying the options. The available options are:
_optional: If set to True, allows the value to be missing from the input. This only applies to values in maps.
_default: Specifies a default value to use if the value is missing from the input. This only applies to values in maps. Setting _defaut implies _optional: True.
_allow_null: If set to True, allows the value to be explicitly set to Null.
_min_count: Specifies the minimum number of elements that must be in the list or map.
_max_count: Specifies the maximum number of elements that must be in the list or map.
Short form input type:
input:value1: INT_COLUMNvalue2: INT|FLOATvalue3: [STRING]value4: {INT: STRING}
Long form input type:
input:value1:_type: INT_COLUMN_optional: Truevalue2:_type: INT|FLOAT_default: 2.2_allow_null: Truevalue3:_type: [STRING]_min_count: 1value4:_type: {INT: STRING}_min_count: 1_max_count: 100
Input value (assuming column1 is an INT_COLUMN, constant1 is a [STRING], and aggregate1 is an STRING):
input:value1: @column1value2: 2.2value3: @constant1value4: {1: test1, 2: @aggregate1}