MetaFOX

Advanced Automated Machine Learning Component

MetaFOX is an advanced automated machine learning (AutoML) component that provides a comprehensive set of tools for feature selection, hyperparameter optimization, and model selection. It streamlines the process of developing machine learning models, making AI more accessible to experts and non-specialists alike.

MetaFOX Logo

Overview

MetaFOX is an advanced AutoML component designed to streamline the creation, optimization, and management of machine learning models. It integrates seamlessly with other services, leveraging both synchronous and asynchronous communication protocols to facilitate robust data processing and model generation.

As an open-source module, MetaFOX implements existing state-of-the-art methods, techniques, libraries, and standards, while introducing novel machine learning concepts to enable developers to easily implement machine learning models in their applications.

Key Features

Model Selection

MetaFOX evaluates numerous machine learning models to find the best architecture for a specific problem, efficiently searching through various model types (linear models, tree-based models, neural networks) to identify optimal solutions.

Feature Engineering

Automatically detects the best transformations and interactions of features that benefit the model. It identifies the most relevant features for the target task and creates new features to improve the model's performance.

Hyperparameter Optimization

Optimizes hyperparameters much more efficiently than manual experimentation. MetaFOX automatically tunes hyperparameters to improve model performance, reducing the need for manual tuning and speeding up development.

Architecture Adjustments

In transfer learning scenarios, MetaFOX can determine the best architecture adjustments needed when adapting a pre-trained model to a new task, such as adding or removing layers, or adjusting the size of layers.

Architecture

MetaFOX utilizes a microservice-based architecture that supports cloud-native deployment. The main components include:

  • User Interfaces: MetaFOX exposes a well-documented Swagger.io (OpenAPI) UI.
  • API Layer: Provides REST APIs for synchronous operations and task management.
  • Worker Service: Performs the actual AutoML operations and model training.
  • Message Broker: Facilitates communication between components using RabbitMQ.
  • Storage Solutions: Uses MongoDB and Redis for data persistence and caching.

The system is designed to serve multiple users with multiple AutoML jobs concurrently, thanks to its cloud-native design and Kubernetes auto-scaling capabilities.

Technology Stack

FastAPI (v0.115.7)

High-performance web framework for building APIs with Python 3.6+

Celery (v5.4.0)

Distributed task queue for asynchronous task execution

MongoDB

NoSQL document database used as result backend for Celery

Redis

Data store used as result backend for Celery

RabbitMQ

Message-broker software used for component communication

TPOT (v0.12.2)

Python library for automated ML pipeline creation using genetic programming

Docker

Platform for developing and running applications in containers

Kubernetes

Container-orchestration system for application deployment and management

Keycloak

Identity and access management solution for authentication

Usage Examples

Classification Example

curl -X 'POST' \
  'http://[server-address]/metafox/tpot/automl/job/create' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
      "job_name": "Classification Example - Iris",
      "data_source": "https://raw.githubusercontent.com/Drashko73/datasets/master/iris/iris.csv",
      "target_variable": "species",
      "problem_type": "classification"
  }'

This example demonstrates creating a classification model using the famous Iris dataset, where the machine learning task is to classify flowers into different species based on their measurements.

Result: The optimization process improves accuracy over generations, producing an optimal model for flower species prediction.

Regression Example

curl -X 'POST' \
  'http://[server-address]/metafox/tpot/automl/job/create' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
      "job_name": "Regression Example - Boston Housing",
      "data_source": "https://raw.githubusercontent.com/Drashko73/datasets/master/boston_housing/housing.csv",
      "target_variable": "medv",
      "problem_type": "regression"
  }'

This example shows how to create a regression model using the Boston Housing dataset, predicting housing prices (medv) based on various neighborhood and house features.

Result: The Mean Squared Error (MSE) decreases over generations, demonstrating the optimization process finding increasingly accurate models for predicting housing prices.

Typical Workflow

  1. Create an AutoML Job: Initialize a job with dataset information, target variable, and problem type.
  2. Start the AutoML Process: Trigger the model optimization process using the job ID returned in step 1.
  3. Monitor Progress: Check the status and optimization progress of the job as it evolves.
  4. Export the Final Model: Once optimization is complete, export the best model found in BentoML format.
  5. Deploy the Model: Use the optimized model in your application or further processing.

Current Limitations

  • MetaFOX currently only supports data input from CSV files accessed via online resources (e.g., GitHub raw files).
  • The underlying TPOT library does not support columns with non-numeric data types. Users must preprocess and encode non-numeric data before passing it to MetaFOX.

Ready to try MetaFOX?

Explore the documentation to get started with automated machine learning and streamline your model development process.