Hyperparameter Tuning
Vibe Prompt
"Use a genetic algorithm to optimize Random Forest hyperparameters: n_estimators (10-200), max_depth (3-20), min_samples_split (2-10), with the goal of maximizing cross-validation accuracy."
import random
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
data = load_digits()
X, y = data.data, data.target
def evaluate(params):
n_est, depth, min_split = params
rf = RandomForestClassifier(
n_estimators=int(n_est),
max_depth=int(depth),
min_samples_split=int(min_split),
random_state=42, n_jobs=-1
)
scores = cross_val_score(rf, X, y, cv=3, scoring='accuracy')
return scores.mean()
# Simple Genetic Algorithm Tuner
bounds = [(10, 200), (3, 20), (2, 10)]
pop = [[random.uniform(b[0], b[1]) for b in bounds] for _ in range(20)]
for gen in range(20):
scored = [(evaluate(p), p) for p in pop]
scored.sort(reverse=True)
pop = [p for _, p in scored]
print(f"Generation {gen+1} best: {scored[0][0]:.4f} (n_est={int(scored[0][1][0])}, depth={int(scored[0][1][1])})")
next_pop = pop[:2]
while len(next_pop) < 20:
p1, p2 = random.choices(pop[:10], k=2)
child = [random.choice([p1[i], p2[i]]) for i in range(3)]
child = [c + random.uniform(-5, 5) for c in child]
child = [max(b[0], min(b[1], child[i])) for i, b in enumerate(bounds)]
next_pop.append(child)
pop = next_pop
print(f"\nBest params: n_est={int(pop[0][0])}, depth={int(pop[0][1])}, min_split={int(pop[0][2])}")
Chapter Summary
- Understand the core concepts and theory
- Master implementation methods and techniques
- Learn common issues and their solutions
- Apply knowledge to real-world projects
Further Reading
- Official documentation and API references
- Open source projects on GitHub
- Related technical books and courses
- Community discussions and technical blogs
Implementation Examples
Basic Examples
# This section provides a complete implementation example
# to help you apply what you've learned to real projects
Steps
- Initialization: Set up the development environment and required tools
- Data Preparation: Collect and organize the required data
- Core Implementation: Implement the main functionality and logic
- Testing & Validation: Ensure the functionality works correctly
- Optimization: Tune performance and user experience
Common Errors
| Error Type | Possible Cause | Solution | |-----------|---------------|----------| | Compilation Error | Syntax issues | Check code syntax | | Runtime Error | Environment issues | Verify dependencies are installed | | Logic Error | Algorithm issues | Step-by-step debugging and testing | | Performance Issue | Efficiency issues | Use performance analysis tools |
Code Example
# Example code
import sys
def main():
# Main program logic
print("Hello, World!")
if __name__ == "__main__":
main()
Related Resources
- Official documentation
- API reference manuals
- Open source project examples
- Technical community discussions
Why Hyperparameter Tuning?
Machine learning models have hyperparameters that significantly affect performance. Manual tuning is time-consuming; automated tuning finds better configurations.
| Method | Effort | Quality | |--------|--------|---------| | Manual | Low effort | Poor | | Grid search | Medium effort | Good (if range is right) | | Random search | Low effort | Good | | Bayesian optimization | Medium effort | Better | | Evolutionary optimization | High effort | Best for complex spaces |
Genetic Algorithm for Hyperparameter Tuning
import random
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
def ga_tune(X, y, bounds, pop_size=20, generations=30, mutation_rate=0.1):
"""Tune hyperparameters with a genetic algorithm."""
n_params = len(bounds)
# Initialize population
pop = [[random.uniform(b[0], b[1]) for b in bounds] for _ in range(pop_size)]
for gen in range(generations):
# Evaluate
fitness = []
for ind in pop:
params = [int(p) if isinstance(b[0], int) else p for p, b in zip(ind, bounds)]
model = RandomForestClassifier(
n_estimators=int(params[0]),
max_depth=int(params[1]),
min_samples_split=int(params[2]),
random_state=42, n_jobs=-1
)
score = cross_val_score(model, X, y, cv=3, scoring='accuracy').mean()
fitness.append(score)
best_idx = fitness.index(max(fitness))
print(f"Gen {gen}: best accuracy = {fitness[best_idx]:.4f}")
# Create next generation
new_pop = [pop[best_idx].copy()] # Elitism
while len(new_pop) < pop_size:
# Tournament selection
p1 = pop[min(random.sample(range(pop_size), 3), key=lambda i: -fitness[i])]
p2 = pop[min(random.sample(range(pop_size), 3), key=lambda i: -fitness[i])]
# Uniform crossover
child = [p1[i] if random.random() < 0.5 else p2[i] for i in range(n_params)]
# Gaussian mutation
for i in range(n_params):
if random.random() < mutation_rate:
child[i] += np.random.normal(0, (bounds[i][1] - bounds[i][0]) * 0.1)
child[i] = max(bounds[i][0], min(bounds[i][1], child[i]))
new_pop.append(child)
pop = new_pop
# Return best
fitness = []
for ind in pop:
params = [int(p) if isinstance(bounds[0][0], int) else p for p in ind]
model = RandomForestClassifier(
n_estimators=int(params[0]),
max_depth=int(params[1]),
min_samples_split=int(params[2]),
random_state=42, n_jobs=-1
)
fitness.append(cross_val_score(model, X, y, cv=3, scoring='accuracy').mean())
best = pop[fitness.index(max(fitness))]
return [int(p) if isinstance(bounds[0][0], int) else p for p in best], max(fitness)
# Example: Tune Random Forest on digits dataset
from sklearn.datasets import load_digits
data = load_digits()
bounds = [(10, 200), (3, 20), (2, 10)] # n_estimators, max_depth, min_samples_split
best_params, best_score = ga_tune(data.data, data.target, bounds)
print(f"\nBest params: n_estimators={best_params[0]}, max_depth={best_params[1]}, min_split={best_params[2]}")
print(f"Best accuracy: {best_score:.4f}")
Summary
Hyperparameter tuning automates finding optimal model configurations. Genetic algorithms explore the parameter space efficiently, finding better combinations than manual or grid search.
Key takeaways: | GA tuning: encode hyperparameters as chromosome, fitness = validation accuracy | | Bounds: define valid ranges for each hyperparameter | | Elitism: keep the best individual each generation | | Uniform crossover: mix genes from two parents randomly | | Gaussian mutation: small random perturbations to explore | | GA vs grid: GA adapts its search, grid wastes time on bad regions | | GA vs random: GA exploits good regions while still exploring | | Fitness landscape: accuracy is noisy โ use cross-validation |
Next Chapter: Optimization API
The next chapter covers building an optimization API service.