MetaFOX
Advanced Automated Machine Learning Component
MetaFOX is an advanced automated machine learning (AutoML) component that provides a comprehensive set of tools for feature selection, hyperparameter optimization, and model selection. It streamlines the process of developing machine learning models, making AI more accessible to experts and non-specialists alike.

Overview
MetaFOX is an advanced AutoML component designed to streamline the creation, optimization, and management of machine learning models. It integrates seamlessly with other services, leveraging both synchronous and asynchronous communication protocols to facilitate robust data processing and model generation.
As an open-source module, MetaFOX implements existing state-of-the-art methods, techniques, libraries, and standards, while introducing novel machine learning concepts to enable developers to easily implement machine learning models in their applications.
Key Features
Model Selection
MetaFOX evaluates numerous machine learning models to find the best architecture for a specific problem, efficiently searching through various model types (linear models, tree-based models, neural networks) to identify optimal solutions.
Feature Engineering
Automatically detects the best transformations and interactions of features that benefit the model. It identifies the most relevant features for the target task and creates new features to improve the model's performance.
Hyperparameter Optimization
Optimizes hyperparameters much more efficiently than manual experimentation. MetaFOX automatically tunes hyperparameters to improve model performance, reducing the need for manual tuning and speeding up development.
Architecture Adjustments
In transfer learning scenarios, MetaFOX can determine the best architecture adjustments needed when adapting a pre-trained model to a new task, such as adding or removing layers, or adjusting the size of layers.
Architecture
MetaFOX utilizes a microservice-based architecture that supports cloud-native deployment. The main components include:
- User Interfaces: MetaFOX exposes a well-documented Swagger.io (OpenAPI) UI.
- API Layer: Provides REST APIs for synchronous operations and task management.
- Worker Service: Performs the actual AutoML operations and model training.
- Message Broker: Facilitates communication between components using RabbitMQ.
- Storage Solutions: Uses MongoDB and Redis for data persistence and caching.
The system is designed to serve multiple users with multiple AutoML jobs concurrently, thanks to its cloud-native design and Kubernetes auto-scaling capabilities.
Technology Stack
FastAPI (v0.115.7)
High-performance web framework for building APIs with Python 3.6+
Celery (v5.4.0)
Distributed task queue for asynchronous task execution
MongoDB
NoSQL document database used as result backend for Celery
Redis
Data store used as result backend for Celery
RabbitMQ
Message-broker software used for component communication
TPOT (v0.12.2)
Python library for automated ML pipeline creation using genetic programming
Docker
Platform for developing and running applications in containers
Kubernetes
Container-orchestration system for application deployment and management
Keycloak
Identity and access management solution for authentication
Usage Examples
Classification Example
curl -X 'POST' \ 'http://[server-address]/metafox/tpot/automl/job/create' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "job_name": "Classification Example - Iris", "data_source": "https://raw.githubusercontent.com/Drashko73/datasets/master/iris/iris.csv", "target_variable": "species", "problem_type": "classification" }'
This example demonstrates creating a classification model using the famous Iris dataset, where the machine learning task is to classify flowers into different species based on their measurements.
Result: The optimization process improves accuracy over generations, producing an optimal model for flower species prediction.
Regression Example
curl -X 'POST' \ 'http://[server-address]/metafox/tpot/automl/job/create' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "job_name": "Regression Example - Boston Housing", "data_source": "https://raw.githubusercontent.com/Drashko73/datasets/master/boston_housing/housing.csv", "target_variable": "medv", "problem_type": "regression" }'
This example shows how to create a regression model using the Boston Housing dataset, predicting housing prices (medv) based on various neighborhood and house features.
Result: The Mean Squared Error (MSE) decreases over generations, demonstrating the optimization process finding increasingly accurate models for predicting housing prices.
Typical Workflow
- Create an AutoML Job: Initialize a job with dataset information, target variable, and problem type.
- Start the AutoML Process: Trigger the model optimization process using the job ID returned in step 1.
- Monitor Progress: Check the status and optimization progress of the job as it evolves.
- Export the Final Model: Once optimization is complete, export the best model found in BentoML format.
- Deploy the Model: Use the optimized model in your application or further processing.
Current Limitations
- MetaFOX currently only supports data input from CSV files accessed via online resources (e.g., GitHub raw files).
- The underlying TPOT library does not support columns with non-numeric data types. Users must preprocess and encode non-numeric data before passing it to MetaFOX.
Ready to try MetaFOX?
Explore the documentation to get started with automated machine learning and streamline your model development process.