The Learning Agent provides means for self-managed machine learning, in other words, orchestrates the learning process. To orchestrate the process, it must be thoroughly described. The description defines all phases of the Complex-Event Machine Learning framework; therefore, the request it's called Complex-Event Machine Learning Request (CEMLR) and execution of a CEMLR it's known as CEML Process (CEMLP). Summarizing the CEML framework, the CEMLR should describe the following parts: Feature Space Description and, Pre-Processing and Feature Extraction Rules (CEML: Data Pre-Processing Phase): Learning Description (CEML: Learning Phase), Evaluation Description (CEML: Continuous Validation Phase), and Actuation Rules (CEML: Deployment Phase). The Pre-Processing Rules describes how the fragmented raw input data or data streams are processed and aggregated. The Feature Extraction Rules define how features are extracted from pre-processed data. The Learning Description defines the selection of an Algorithm, Parameters, and the Feature Space for construction of a model. The Evaluation Description is used to construct an Evaluator. The Evaluator is attached to the model and is responsible for providing real-time performance metrics about the model and deciding if it reaches the expected scores. Finally, the Actuation Rules describe actuation of the system whenever the model reaches the expected performance scores. All steps are performed in an Execution Pipeline Environment (EPE) or distributed in a set of interconnected EPE over the network. The EPE it is usually a CEP engine which most commonly is Esper CEP engine. Finally, the CEML: Data Collection Propagation phase can be split in two: Incoming data and outgoing data. The incoming data isn't managed in the CEMLR; this is done using additional APIs of the agent (SEE HERE). The outgoing data can be partially controlled using same features of the Statement API (see statement-rest and Statement Native).
The CEML Request
The request is a JSON document containing information such the agent can orchestrate the learning process or CEMLP. The OpenAPI Documentation can be found cem-rest. The JSON document contains the following sections:
Data Descriptors (Feature Space Description) (legacy)
Describe the data expected by the model or feature space. The DataDescriptors consist of the input feature space and the target/label/ground truth feature space. The features can be described as named features or anonymous. The named features are described by name and type, each one. The anonymous or lambda feature space, all the features are described as a vector of the same type. In both cases, the input space and the target space are describe separately. (See also DataDescriptors)
Data Schema (Feature Space Schema)
Describe the data expected by the model or feature space. See also Data Schema.
Model (Model Description/Selection)
The JSON property model contains the description of the selected algorithm Name, its configuration and parameters as Parameters, and evaluation thresholds targets; or the serialization of an existing trained model. The agent will instance or connect to an existing implementation of the model or a model implementation backend using the configuration and parameters given. In case the model is a serialization version of a model, the agent will load and make the necessary changes to the environment where the agent is deployed. (See also Model Interface and Model Class)
Target (Evaluation Thresholds)
The targets are the thresholds which the Models should reach to considered "ready to be used". The metrics are the methods to score the performance of the model. The Targets are define using a Name, Thresholds, and comparison Method. The name depends if is a classification or regression Model. The Thresholds is a numeric value depending on the metric type. The comparison Method indicates if the score of the model should be greater, equal or greater, equal, less, or, less equal. With the metrics, the agent is able to validate the model in real-time, using an Evaluator. The selection of the algorithm will implicitly select which type Evaluator will do the evaluation, i.e., Classification or Regression algorithm. Depending on the evaluation type there are the following metrics implemented:
Accuracy, F1 Score (F1Score), Fallout (FallOut), False Discovery Rate (FDR), Informedness, Markedness, Matthews Correlation Coefficient (MCC), Miss Rate (MR), Negative Predictive Value (NPR), Precision, Sensitivity, Specificity (See also Classification Evaluator)
Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Akaike Information Criterion (AICc) (See also Regression Evaluator)
Learning Statements (Learning Rules)
Learning Statements are the definition of how the live-data is processed to obtain a datapoint, according to feature space described in Data Descriptors. The processing is done through CEP queries (usual Esper EPL) and produces the Learning Streams.
Deployment Statements (Actuation Rules)
The description of how to use the trained and validated Model is provided in the Deployment Statements, producing the Deployment Streams, similarly as the Learning Streams. These streams are used when the continues validation stays true and this happens when the Evaluator detects that the Model reaches the Target thresholds of Metrics using the Learning Streams to feed the model, then the agent deploys the Deployment Statements alongside the Model.