Hyperparameter Tuning

Vibe Prompt

"Use a genetic algorithm to optimize Random Forest hyperparameters: n_estimators (10-200), max_depth (3-20), min_samples_split (2-10), with the goal of maximizing cross-validation accuracy."

import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score

data = load_digits()
X, y = data.data, data.target

def evaluate(params):
    n_est, depth, min_split = params
    rf = RandomForestClassifier(
        n_estimators=int(n_est),
        max_depth=int(depth),
        min_samples_split=int(min_split),
        random_state=42, n_jobs=-1
    )
    scores = cross_val_score(rf, X, y, cv=3, scoring='accuracy')
    return scores.mean()

# Simple Genetic Algorithm Tuner
bounds = [(10, 200), (3, 20), (2, 10)]
pop = [[random.uniform(b[0], b[1]) for b in bounds] for _ in range(20)]

for gen in range(20):
    scored = [(evaluate(p), p) for p in pop]
    scored.sort(reverse=True)
    pop = [p for _, p in scored]
    print(f"Generation {gen+1} best: {scored[0][0]:.4f} (n_est={int(scored[0][1][0])}, depth={int(scored[0][1][1])})")
    
    next_pop = pop[:2]
    while len(next_pop) < 20:
        p1, p2 = random.choices(pop[:10], k=2)
        child = [random.choice([p1[i], p2[i]]) for i in range(3)]
        child = [c + random.uniform(-5, 5) for c in child]
        child = [max(b[0], min(b[1], child[i])) for i, b in enumerate(bounds)]
        next_pop.append(child)
    pop = next_pop

print(f"\nBest params: n_est={int(pop[0][0])}, depth={int(pop[0][1])}, min_split={int(pop[0][2])}")

Chapter Summary

Understand the core concepts and theory
Master implementation methods and techniques
Learn common issues and their solutions
Apply knowledge to real-world projects

Implementation Examples

Basic Examples

# This section provides a complete implementation example
# to help you apply what you've learned to real projects

Steps

Initialization: Set up the development environment and required tools
Data Preparation: Collect and organize the required data
Core Implementation: Implement the main functionality and logic
Testing & Validation: Ensure the functionality works correctly
Optimization: Tune performance and user experience

Common Errors

| Error Type | Possible Cause | Solution | |-----------|---------------|----------| | Compilation Error | Syntax issues | Check code syntax | | Runtime Error | Environment issues | Verify dependencies are installed | | Logic Error | Algorithm issues | Step-by-step debugging and testing | | Performance Issue | Efficiency issues | Use performance analysis tools |

Code Example

# Example code
import sys

def main():
    # Main program logic
    print("Hello, World!")

if __name__ == "__main__":
    main()

Related Resources

Official documentation
API reference manuals
Open source project examples
Technical community discussions

Why Hyperparameter Tuning?

Machine learning models have hyperparameters that significantly affect performance. Manual tuning is time-consuming; automated tuning finds better configurations.

| Method | Effort | Quality | |--------|--------|---------| | Manual | Low effort | Poor | | Grid search | Medium effort | Good (if range is right) | | Random search | Low effort | Good | | Bayesian optimization | Medium effort | Better | | Evolutionary optimization | High effort | Best for complex spaces |

Genetic Algorithm for Hyperparameter Tuning

import random
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def ga_tune(X, y, bounds, pop_size=20, generations=30, mutation_rate=0.1):
    """Tune hyperparameters with a genetic algorithm."""
    n_params = len(bounds)
    
    # Initialize population
    pop = [[random.uniform(b[0], b[1]) for b in bounds] for _ in range(pop_size)]
    
    for gen in range(generations):
        # Evaluate
        fitness = []
        for ind in pop:
            params = [int(p) if isinstance(b[0], int) else p for p, b in zip(ind, bounds)]
            model = RandomForestClassifier(
                n_estimators=int(params[0]),
                max_depth=int(params[1]),
                min_samples_split=int(params[2]),
                random_state=42, n_jobs=-1
            )
            score = cross_val_score(model, X, y, cv=3, scoring='accuracy').mean()
            fitness.append(score)
        
        best_idx = fitness.index(max(fitness))
        print(f"Gen {gen}: best accuracy = {fitness[best_idx]:.4f}")
        
        # Create next generation
        new_pop = [pop[best_idx].copy()]  # Elitism
        
        while len(new_pop) < pop_size:
            # Tournament selection
            p1 = pop[min(random.sample(range(pop_size), 3), key=lambda i: -fitness[i])]
            p2 = pop[min(random.sample(range(pop_size), 3), key=lambda i: -fitness[i])]
            
            # Uniform crossover
            child = [p1[i] if random.random() < 0.5 else p2[i] for i in range(n_params)]
            
            # Gaussian mutation
            for i in range(n_params):
                if random.random() < mutation_rate:
                    child[i] += np.random.normal(0, (bounds[i][1] - bounds[i][0]) * 0.1)
                    child[i] = max(bounds[i][0], min(bounds[i][1], child[i]))
            
            new_pop.append(child)
        
        pop = new_pop
    
    # Return best
    fitness = []
    for ind in pop:
        params = [int(p) if isinstance(bounds[0][0], int) else p for p in ind]
        model = RandomForestClassifier(
            n_estimators=int(params[0]),
            max_depth=int(params[1]),
            min_samples_split=int(params[2]),
            random_state=42, n_jobs=-1
        )
        fitness.append(cross_val_score(model, X, y, cv=3, scoring='accuracy').mean())
    
    best = pop[fitness.index(max(fitness))]
    return [int(p) if isinstance(bounds[0][0], int) else p for p in best], max(fitness)

# Example: Tune Random Forest on digits dataset
from sklearn.datasets import load_digits
data = load_digits()
bounds = [(10, 200), (3, 20), (2, 10)]  # n_estimators, max_depth, min_samples_split

best_params, best_score = ga_tune(data.data, data.target, bounds)
print(f"\nBest params: n_estimators={best_params[0]}, max_depth={best_params[1]}, min_split={best_params[2]}")
print(f"Best accuracy: {best_score:.4f}")

Summary

Hyperparameter tuning automates finding optimal model configurations. Genetic algorithms explore the parameter space efficiently, finding better combinations than manual or grid search.

Key takeaways: | GA tuning: encode hyperparameters as chromosome, fitness = validation accuracy | | Bounds: define valid ranges for each hyperparameter | | Elitism: keep the best individual each generation | | Uniform crossover: mix genes from two parents randomly | | Gaussian mutation: small random perturbations to explore | | GA vs grid: GA adapts its search, grid wastes time on bad regions | | GA vs random: GA exploits good regions while still exploring | | Fitness landscape: accuracy is noisy — use cross-validation |

Next Chapter: Optimization API

The next chapter covers building an optimization API service.