Swarm Intelligence: Evaluating Success, Learning, and Code Optimization (Part 2)
Last Updated on May 29, 2026 by Editorial Team
Author(s): Martin Rmz Rabelo
Originally published on Towards AI.
Swarm Intelligence: Evaluating Success, Learning, and Code Optimization (Part 2)
From the philosophy of decision-making under uncertainty to the Max-Min Ant System and execution speedups using Restricted Candidate Lists.

In Part 1 of this series, we translated the biological marvel of stigmergy into a mathematical framework for the Bin Packing Problem (BPP). We shifted the Ant Colony Optimization (ACO) paradigm from routing to grouping, and we built the core Python structure that allows our virtual ants to construct probabilistic solutions.
But we left two massive questions unanswered: How do we mathematically define a “good” solution? And how do we stop the probabilistic engine from taking hours to run?
In this second and final part, we will explore the philosophical bias hidden inside loss functions, implement the colony’s learning mechanism, and introduce a clever speed optimization borrowed from the GRASP metaheuristic to make our Python code blazingly fast.
The Bias of the Metric: What is a “Good” Decision?
Before the swarm can learn, it needs a metric for success. In machine learning and optimization, the objective function (or loss function) is often seen as a cold, objective truth. In reality, it is a mathematical manifestation of our biases.
Think about human decision-making under uncertainty. When you evaluate a career choice or an investment strategy, the metric you choose to maximize (e.g., free time, total wealth, or stability) completely alters the path you take. The world doesn’t change, but your algorithm for navigating it does. Without the right compass, we risk marching like ants trapped in an infinite, paradoxical loop — iterating endlessly through an NP-hard search space, executing local steps that feel perfectly logical, yet never escaping the cycle to find the global optimum.
In the Bin Packing Problem, a naive metric for success would be simply counting the total number of bins used (N). The fewer bins, the better. However, this metric is flat; it doesn’t give the ants any gradient to learn from if multiple solutions use the same number of bins but have different internal distributions.
To guide the colony effectively, we use the fitness function proposed by Falkenauer and Dechembre [1]:

Where N is the number of bins, F_i is the sum of the item weights inside bin i, and c is the total capacity.
The genius of this function lies in the parameter k, which the authors optimized at k=2. By squaring the utilization ratio, this function acts as a severe bias: it heavily rewards bins that are filled to the brim and exponentially penalizes wasted space. The ants are no longer just trying to use fewer boxes; they are aggressively pursuing perfect compaction.
def fitness(solution, capacity):
total_sum = 0
for current_bin in solution:
# Sum the weights in the current bin and calculate its utilization ratio squared
utilization = (sum(current_bin) / capacity)**2
total_sum += utilization
# Return the average utilization across all used bins
eval_score = total_sum / len(solution)
return eval_score
Setting the Stage: Initializing the Pheromone Matrix
Before the ants can learn from the fitness function, they need a blank canvas.
In our BPP adaptation, the pheromone matrix τ(i, j) encodes the affinity between item sizes. Therefore, the first step is to extract all unique item weights from our dataset and sort them in descending order to construct the axes of our matrix.
Furthermore, the matrix cannot be initialized with zeros, as this would break the probability calculations in the first generation. For a standard ACO without local search, the optimal initial pheromone value T(0) is inversely proportional to the evaporation rate ρ, defined as:

Here is the Python implementation for this preprocessing and initialization phase:
import numpy as np
def generate_unique_weights_list(item_weights):
# Sort weights descending and remove duplicates to form the matrix axes
sorted_weights = sorted(item_weights, reverse=True)
unique_weights = []
seen = set()
for weight in sorted_weights:
if weight not in seen:
unique_weights.append(weight)
seen.add(weight)
return unique_weights
def initialize_pheromone_matrix(unique_weights, rho):
n = len(unique_weights)
# Initialize an N x N matrix with the optimal T(0) value
initial_value = 1 / (1 - rho)
return np.ones((n, n)) * initial_value
How the Swarm Learns: The Max-Min Ant System (MMAS)
If every single ant in a colony of hundreds deposits a pheromone trail after every generation, the shared memory matrix quickly descends into noise.
To solve this, Levine (2004) [3] demonstrated that the Max-Min Ant System (MMAS) by Stützle and Hoos [2] yields excellent results for BPP. In this variation, we enforce extreme elitism: only the absolute best ant is allowed to update the pheromones.
The pheromone matrix is updated using the following formula:

Here, ρ represents the evaporation rate. Evaporation is crucial; without it, the colony would get trapped in the first local optimum it finds. The variable m indicates how many times the weight i and the weight j appear together in the best solution . The more a specific weight combination appears in a highly-rated packing pattern, the stronger the trail becomes.
def count_pairs(solution, weight_i, weight_j):
count = 0
for current_bin in solution:
if weight_i in current_bin and weight_j in current_bin:
if weight_i == weight_j:
# Handle the special case where both items have the exact same weight
count += current_bin.count(weight_i) * (current_bin.count(weight_i) - 1) // 2
else:
# Count occurrences of both elements in the bin
count += current_bin.count(weight_i) * current_bin.count(weight_j)
return count
def update_pheromones(pheromone_matrix, rho, best_solution, unique_weights, capacity):
for i in range(len(unique_weights)):
for j in range(len(unique_weights)):
# Calculate the update according to the MMAS grouping equation
m = count_pairs(best_solution, unique_weights[i], unique_weights[j])
pheromone_matrix[i, j] = rho * pheromone_matrix[i, j] + (m * fitness(best_solution, capacity))
return pheromone_matrix

Orchestrating the Swarm: The Main Loop and Population Diversity
With our pheromone matrix initialized and our update rules defined, we need a central controller to orchestrate the swarm across multiple generations.
The main loop handles the lifecycle of the colony. In each generation, a set number of ants builds their solutions independently. Once they finish, they are evaluated using our squared fitness metric.
Here, the delicate balance between exploration and exploitation takes center stage. Following Levine’s [3] parameter tuning, our algorithm alternates its learning logic: every 5 generations, we reinforce the Global Best ant to consolidate learning. In all other generations, we update using the Iteration Best ant to maintain swarm diversity.
However, how do we mathematically know if the swarm is exploring or just blindly exploiting?
As stablished in [4], in population-based metaheuristics — whether we are talking about Ant Colony Optimization or Evolutionary Computing (like Genetic Algorithms) — diversity is the lifeblood of the search process. If every agent in the population starts building the exact same solution, the variability drops to zero, and the search gets permanently stuck in a local optimum.
To monitor this, we introduce an analytical tracker into our main loop: the standard deviation of the colony’s fitness. While this metric doesn’t alter the ants’ direct behavior, calculating it generation by generation gives us a crucial diagnostic tool to visualize premature convergence.
import statistics
def run_aco(capacity, item_weights, rho, beta, generations, num_ants, unique_weights):
pheromone_matrix = initialize_pheromone_matrix(unique_weights, rho)
global_best_ant = (0, []) # Stores (fitness_score, solution_list)
for generation in range(generations):
# Initialize the colony for the current generation
ants = [Ant(capacity, pheromone_matrix, beta) for _ in range(num_ants)]
iteration_best_ant = (0, [])
colony = [] # List to track the entire population
for ant in ants:
# 1. Construction Phase
solution = ant.build_solution(item_weights)
# 2. Evaluation Phase
ant_fitness = fitness(solution, capacity)
current_ant = (ant_fitness, solution)
colony.append(current_ant)
# Track the best ant in this specific generation
if ant_fitness > iteration_best_ant[0]:
iteration_best_ant = current_ant
# Track the absolute best ant across all generations
if iteration_best_ant[0] > global_best_ant[0]:
global_best_ant = iteration_best_ant
# --- Tracking Population Diversity ---
# Extract fitness scores and calculate standard deviation
fitness_scores = [ant[0] for ant in colony]
diversity_metric = statistics.stdev(fitness_scores) if len(fitness_scores) > 1 else 0
# Early stopping if a perfect packing configuration is found (fitness = 1.0)
if iteration_best_ant[0] == 1.0:
print(f"Global optimum found at generation {generation}")
break
# 3. Pheromone Update Phase (The MMAS alternating strategy)
if generation % 5 == 0:
pheromone_matrix = update_pheromones(pheromone_matrix, rho, global_best_ant[1], unique_weights, capacity)
else:
pheromone_matrix = update_pheromones(pheromone_matrix, rho, iteration_best_ant[1], unique_weights, capacity)
return global_best_ant, generation
Overcoming the Computational Bottleneck: The Restricted Candidate List
We now have a fully functional learning swarm. However, if you run the code as it is, you will hit a massive computational wall.
The bottleneck lives inside the get_probability engine we built in Part 1. Every time an ant needs to pack a single item, it evaluates the pheromone-heuristic combination for all available items that fit in the remaining space of the bin. For large datasets—like instances with 1,000 or 10,000 items—calculating this massive denominator iteratively crushes execution speed.
To solve this, we introduced in [4] a brilliant concept borrowed from the GRASP (Greedy Randomized Adaptive Search Procedure) metaheuristic: the Restricted Candidate List (RCL).
Instead of evaluating the probability for every item that fits, we prune the search space. By injecting an RCL limit into our build_solution method, we create a Greedy Probabilistic Adaptive Procedure (GPAP).
First, we define a small, hard limit for our candidate pool (e.g., a maximum of 8 items):
def get_rcl_size(self, available_items):
# Restrict the candidate list to a maximum of 8 items to optimize execution
if len(available_items) >= 8:
return 8
else:
return len(available_items)
Next, we modify the construction loop from Part 1. Before calculating any probabilities, we sort the viable candidates by size (descending) and brutally slice the list to keep only the top candidates.
# ... inside the build_solution loop ...
if not Jk: # If no single remaining item fits, open a new bin
solution.append([])
else:
# --- The RCL Optimization ---
size = self.get_rcl_size(available_items)
Jk.sort(reverse=True) # Sort elements in descending order
del Jk[size:] # Prune the list to keep only the top candidates
# Now, probabilistic selection logic runs ONLY on this highly curated subset
for item in Jk:
probability = self.get_probability(item, current_bin, Jk)
import random # Usually declared at the top of the script
if round(probability, 4) >= round(random.random(), 4):
selected_candidate = item
current_bin.append(selected_candidate)
available_items.remove(selected_candidate)
break
This simple pruning mechanism slashes execution times by cutting out thousands of unnecessary matrix lookups. Because the list is sorted descending, we ensure the ant only calculates the heavy math for the largest viable objects — perfectly aligning with our heuristic η(j)=j — while retaining just enough randomness to explore new packing combinations.
Beyond Bin Packing: ACO as a High-Quality Initialization Engine
If we analyze the behavior of pure Ant Colony Optimization objectively, we discover a fundamental truth: ACO is not a fine-tuning mechanism; it is an incredibly sophisticated, probabilistic solution initialization engine. The virtual enjambre excels at scanning the global search space, identifying promising regions, and leaving a trail of collective memory (pheromones) that represents highly efficient packing patterns. However, once the ants find a good neighborhood, they lack the intensive “exploitation” or local pressure required to polish those solutions down to the absolute mathematical limit.
In industrial and advanced academic settings, pure ACO is rarely left to work alone. The standard practice is to build a Hybrid Metaheuristic. In this framework, the constructive solutions generated by the ants are immediately handed over to a local search trajectory algorithm — such as Hill Climbing or Simulated Annealing — or used to seed the initial population of an evolutionary system like a Genetic Algorithm. The local search aggressively squeezes the local space, and the pheromone matrix is updated only with these highly refined, post-processed solutions.
This critical concept of tracking population diversity and avoiding premature convergence is the perfect bridge into evolutionary computing. If a population loses its variability too quickly, the search space collapses into local optima — a challenge that we will dissect in the next series of articles, where we will implement Hill Climbing and Genetic Algorithms from scratch in Python to upgrade our current architecture into a true hybrid powerhouse.
Shifting the Context: Grouping Optimization in the Real World
While we have focused entirely on the Bin Packing Problem (BPP) using item weights and fixed containers, the mathematical core of this size-affinity metaheuristic is universal. It applies to virtually any combinatorial grouping problem where elements must be clustered under capacity or risk constraints.
Some immediate parallel applications include:
- Identical Parallel Machine Scheduling: An immediate mathematical analogue where “items” become computational tasks (with processing runtimes) and “bins” become parallel processors. The goal shifts to grouping tasks to minimize the maximum completion time (makespan).
- Portfolio Optimization: Grouping financial assets into specific risk/return buckets or selecting optimal combinations of non-linear financial instruments under capital constraints.
- Algorithmic Trading Strategy Optimization: In quantitative finance, this exact architecture can be adapted to group multiple trading sub-strategies, technical indicators, or market regimes. By treating market conditions as “items” and portfolio risk limits as “bin capacities,” the swarm can discover robust, decentralized combinations that maximize risk-adjusted returns while preventing overfitting.
Whether compacting cargo shipping containers or optimizing automated trading systems under market uncertainty, the synergy of stigmergic learning and structural pruning offers a resilient path toward global optimums.
For developers and researchers interested in full reproducibility, you can find the complete source code, the dataset instances, and my original academic technical report on my [GitHub Repository].
If you found this deep dive into metaheuristics and Python useful, follow me here on Medium and let’s connect on [LinkedIn].
References
[1] Falkenauer, E., & Delchambre, A. (1992). A genetic algorithm for bin packing and line balancing. In Proceedings of the 1992 IEEE International Conference on Robotics and Automation (pp. 1186–1192).
[2] Stützle, T., & Hoos, H. H. (2000). MAX–MIN ant system. Future Generation Computer Systems, 16(8), 889–914.
[3] Levine, J., & Ducatelle, F. (2004). Ant colony optimization and local search for bin packing and cutting stock problems. Journal of the Operational Research Society, 55(7), 705–716.
[4] Ramírez-Rabelo, A. M. (2023). Implementation of an Ant Colony Algorithm to Solve the Bin Packing Problem (BPP). Technical Report. Combinatorial Optimization.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.