Hyperparameter Tuning

Vibe Prompt

"Use a genetic algorithm to optimize Random Forest hyperparameters: n_estimators (10-200), max_depth (3-20), min_samples_split (2-10), with the goal of maximizing cross-validation accuracy."

import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score

data = load_digits()
X, y = data.data, data.target

def evaluate(params):
    n_est, depth, min_split = params
    rf = RandomForestClassifier(
        n_estimators=int(n_est),
        max_depth=int(depth),
        min_samples_split=int(min_split),
        random_state=42, n_jobs=-1
    )
    scores = cross_val_score(rf, X, y, cv=3, scoring='accuracy')
    return scores.mean()

# Simple Genetic Algorithm Tuner
bounds = [(10, 200), (3, 20), (2, 10)]
pop = [[random.uniform(b[0], b[1]) for b in bounds] for _ in range(20)]

for gen in range(20):
    scored = [(evaluate(p), p) for p in pop]
    scored.sort(reverse=True)
    pop = [p for _, p in scored]
    print(f"Generation {gen+1} best: {scored[0][0]:.4f} (n_est={int(scored[0][1][0])}, depth={int(scored[0][1][1])})")
    
    next_pop = pop[:2]
    while len(next_pop) < 20:
        p1, p2 = random.choices(pop[:10], k=2)
        child = [random.choice([p1[i], p2[i]]) for i in range(3)]
        child = [c + random.uniform(-5, 5) for c in child]
        child = [max(b[0], min(b[1], child[i])) for i, b in enumerate(bounds)]
        next_pop.append(child)
    pop = next_pop

print(f"\nBest params: n_est={int(pop[0][0])}, depth={int(pop[0][1])}, min_split={int(pop[0][2])}")

Chapter Summary

  • Understand the core concepts and theory
  • Master implementation methods and techniques
  • Learn common issues and their solutions
  • Apply knowledge to real-world projects

Further Reading

  • Official documentation and API references
  • Open source projects on GitHub
  • Related technical books and courses
  • Community discussions and technical blogs

Implementation Examples

Basic Examples

# This section provides a complete implementation example
# to help you apply what you've learned to real projects

Steps

  1. Initialization: Set up the development environment and required tools
  2. Data Preparation: Collect and organize the required data
  3. Core Implementation: Implement the main functionality and logic
  4. Testing & Validation: Ensure the functionality works correctly
  5. Optimization: Tune performance and user experience

Common Errors

| Error Type | Possible Cause | Solution | |-----------|---------------|----------| | Compilation Error | Syntax issues | Check code syntax | | Runtime Error | Environment issues | Verify dependencies are installed | | Logic Error | Algorithm issues | Step-by-step debugging and testing | | Performance Issue | Efficiency issues | Use performance analysis tools |

Code Example

# Example code
import sys

def main():
    # Main program logic
    print("Hello, World!")

if __name__ == "__main__":
    main()

Related Resources

  • Official documentation
  • API reference manuals
  • Open source project examples
  • Technical community discussions

Why Hyperparameter Tuning?

Machine learning models have hyperparameters that significantly affect performance. Manual tuning is time-consuming; automated tuning finds better configurations.

| Method | Effort | Quality | |--------|--------|---------| | Manual | Low effort | Poor | | Grid search | Medium effort | Good (if range is right) | | Random search | Low effort | Good | | Bayesian optimization | Medium effort | Better | | Evolutionary optimization | High effort | Best for complex spaces |

Genetic Algorithm for Hyperparameter Tuning

import random
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def ga_tune(X, y, bounds, pop_size=20, generations=30, mutation_rate=0.1):
    """Tune hyperparameters with a genetic algorithm."""
    n_params = len(bounds)
    
    # Initialize population
    pop = [[random.uniform(b[0], b[1]) for b in bounds] for _ in range(pop_size)]
    
    for gen in range(generations):
        # Evaluate
        fitness = []
        for ind in pop:
            params = [int(p) if isinstance(b[0], int) else p for p, b in zip(ind, bounds)]
            model = RandomForestClassifier(
                n_estimators=int(params[0]),
                max_depth=int(params[1]),
                min_samples_split=int(params[2]),
                random_state=42, n_jobs=-1
            )
            score = cross_val_score(model, X, y, cv=3, scoring='accuracy').mean()
            fitness.append(score)
        
        best_idx = fitness.index(max(fitness))
        print(f"Gen {gen}: best accuracy = {fitness[best_idx]:.4f}")
        
        # Create next generation
        new_pop = [pop[best_idx].copy()]  # Elitism
        
        while len(new_pop) < pop_size:
            # Tournament selection
            p1 = pop[min(random.sample(range(pop_size), 3), key=lambda i: -fitness[i])]
            p2 = pop[min(random.sample(range(pop_size), 3), key=lambda i: -fitness[i])]
            
            # Uniform crossover
            child = [p1[i] if random.random() < 0.5 else p2[i] for i in range(n_params)]
            
            # Gaussian mutation
            for i in range(n_params):
                if random.random() < mutation_rate:
                    child[i] += np.random.normal(0, (bounds[i][1] - bounds[i][0]) * 0.1)
                    child[i] = max(bounds[i][0], min(bounds[i][1], child[i]))
            
            new_pop.append(child)
        
        pop = new_pop
    
    # Return best
    fitness = []
    for ind in pop:
        params = [int(p) if isinstance(bounds[0][0], int) else p for p in ind]
        model = RandomForestClassifier(
            n_estimators=int(params[0]),
            max_depth=int(params[1]),
            min_samples_split=int(params[2]),
            random_state=42, n_jobs=-1
        )
        fitness.append(cross_val_score(model, X, y, cv=3, scoring='accuracy').mean())
    
    best = pop[fitness.index(max(fitness))]
    return [int(p) if isinstance(bounds[0][0], int) else p for p in best], max(fitness)

# Example: Tune Random Forest on digits dataset
from sklearn.datasets import load_digits
data = load_digits()
bounds = [(10, 200), (3, 20), (2, 10)]  # n_estimators, max_depth, min_samples_split

best_params, best_score = ga_tune(data.data, data.target, bounds)
print(f"\nBest params: n_estimators={best_params[0]}, max_depth={best_params[1]}, min_split={best_params[2]}")
print(f"Best accuracy: {best_score:.4f}")

Summary

Hyperparameter tuning automates finding optimal model configurations. Genetic algorithms explore the parameter space efficiently, finding better combinations than manual or grid search.

Key takeaways: | GA tuning: encode hyperparameters as chromosome, fitness = validation accuracy | | Bounds: define valid ranges for each hyperparameter | | Elitism: keep the best individual each generation | | Uniform crossover: mix genes from two parents randomly | | Gaussian mutation: small random perturbations to explore | | GA vs grid: GA adapts its search, grid wastes time on bad regions | | GA vs random: GA exploits good regions while still exploring | | Fitness landscape: accuracy is noisy โ€” use cross-validation |

Next Chapter: Optimization API

The next chapter covers building an optimization API service.

Unlock Full Tutorial

This chapter is paid content. Join the project to unlock over 5000 words of deep analysis, including 10+ god-tier Prompts and real Source Code examples!