Deploying Recommendation System API

In this final chapter we transform the recommendation engine we built throughout the course into a production‑ready API service. By exposing the model through a well‑designed REST interface, any client—whether a web front‑end, mobile app, or another micro‑service—can request personalized recommendations in real time. We will also add a lightweight caching layer, discuss batch preprocessing for scalability, and containerize the service with Docker so it can be deployed to any cloud or on‑premise environment.

What: Core Concepts of the Recommendation Engine

The recommendation engine encapsulates all the logic needed to turn raw data into actionable suggestions. It consists of three main components:

Feature Preparation – Converting raw movie metadata (genres, titles, IDs) into a numeric representation suitable for similarity calculations. We use a MultiLabelBinarizer to create a one‑hot encoded genre matrix.
Similarity Computation – Calculating a cosine similarity matrix between movies based on their genre vectors. This matrix enables content‑based recommendations: given a movie, we can instantly retrieve the most similar titles.
Popularity Scoring – Deriving a popularity score for each movie as the product of rating count and average rating. This score powers the cold‑start fallback and contributes to the hybrid recommendation strategy.

Together, these components allow the engine to serve three recommendation styles:

Content‑based – similarity driven by item attributes.
Hybrid – blends content‑based scores with popularity to balance personalization and broad appeal.
Popularity‑only – returns the top‑rated, widely‑watched items for new users or when insufficient personal data exists.

Why: Business Value and Financial Return

Deploying a recommendation API translates directly into measurable business outcomes:

| Metric | Impact of a Good Recommendation System | |--------|----------------------------------------| | Conversion Rate | Personalized suggestions increase the likelihood of purchase or content consumption by 10‑30 % in e‑commerce and media platforms. | | Average Order Value (AOV) | Cross‑sell and upsell recommendations raise basket size, often boosting AOV by 5‑15 %. | | User Retention | Relevant recommendations keep users engaged, reducing churn and increasing lifetime value (LTV). | | Operational Efficiency | A reusable API centralizes recommendation logic, eliminating duplicated code across front‑ends and reducing maintenance overhead. | | Scalability | Containerized deployment enables horizontal scaling; adding more replicas handles traffic spikes without re‑training models. | | Data‑Driven Decision Making | The API logs requests and responses, providing a rich dataset for A/B testing, model monitoring, and continuous improvement. |

From a founder’s perspective, investing in a well‑architected recommendation service can yield a high return on investment (ROI) because the marginal cost of serving an additional request is low once the service is running, while the revenue uplift per recommendation can be substantial.

How: Step‑by‑Step Implementation Using Vibe Coding

We will now walk through the concrete steps to build, test, and deploy the API. Each step includes the core idea behind the ** the ** how** we implement it with practical code snippets and Vibe‑Coding prompts.

1. Building the Recommendation Engine Class

What – A Python class that loads data, prepares features, computes similarity and popularity, and exposes methods for content‑based, hybrid, and popular recommendations.

Why – Encapsulation makes the engine reusable, testable, and easy to swap with alternative algorithms (e.g., matrix factorization) without changing the API layer.

How – We implement the class as shown below, expanding each method with detailed comments and error handling.

import pandas as pd
import numpy as np
import joblib
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MultiLabelBinarizer

class RecommendationEngine:
    """
    Encapsulates the end‑to‑end recommendation logic.
    """
    
    def __init__(self, movies_path='movies.csv', 
    including data loading, feature engineering, similarity, 
    popularity scoring, and three recommendation strategies.
    """

    def __init__(self, movies_path='movies.csv', ratings_path='ratings.csv'):
        """
        Load the raw CSV files and prepare all internal structures.
        """
        # Load data; raise informative errors if files missing.
        try:
            self.movies = pd.read_csv(movies_path)
            self.ratings = pd.read_csv(ratings_path)
        except FileNotFoundError as e:
            raise RuntimeError(f"Required data file not found: {e.filename}") from e

        # Basic sanity checks.
        if self.movies.empty or self.ratings.empty:
            raise ValueError("Loaded datasets are empty. Check CSV contents.")

        print(f"Recommendation engine initializing with "
              f"{len(self.movies)} movies and {self.ratings['userId'].nunique()} users.")

        # Prepare internal structures.
        self._prepare_features()
        self._compute_similarity()
        self._compute_popularity()

        print("Initialization complete.")

    def _prepare_features(self):
        """
        Convert the pipe‑separated genre column into a one‑hot matrix.
        The resulting DataFrame (self.genre_df) has movieId as index and
        a column for each genre genre.
        """
        # Split genres into lists; handle missing values gracefully.
        self.movies['genres_list'] = self.movies['genres'].fillna('').str.split('|')

        # MultiLabelBinarizer creates a sparse binary matrix.
        mlb = MultiLabelBinarizer()
        genre_matrix = mlb.fit_transform(self.movies['genres_list'])
        self.genre_df = pd.DataFrame(
            genre_matrix,
            columns=mlb.classes_,
            index=self.movies['movieId']
        )
        # Optional: store the binarizer for future use (e.g., new movies).
        self.genre_binarizer = mlb

    def _compute_similarity(self):
        """
        Compute cosine similarity between all movies based on genre vectors.
        The result is stored as a DataFrame for fast look‑up.
        """
        similarity_array = cosine_similarity(self.genre_df)
        self.movie_similarity_df = pd.DataFrame(
            similarity_array,
            index=self.genre_df.index,
            columns=self.genre_df.index
        )
        # Pre‑compute the top‑N similar movies for each item to speed up
        # content‑based look‑ups at query time (optional optimization).
        self._top_similar_cache = {}

    def _compute_popularity(self):
        """
        Derive a popularity score = rating_count * average_rating.
        Stored as a dictionary mapping movieId -> score for O(1) access.
        """
        movie_stats = self.ratings.groupby('movieId').agg(
            rating_count=('rating', 'count'),
            avg_rating=('rating', 'mean')
        )
        # Avoid division by zero; movies with zero ratings get score 0.
        movie_stats['popularity'] = movie_stats['rating_count'] * movie_stats['avg_rating']
        self.popularity = movie_stats['popularity'].fillna(0).to_dict()

    def content_based(self, movie_id: int, n: int = 10):
        """
        Return the top‑n most similar movies to the given movie_id.
        """
        if movie_id not in self.movie_similarity_df.index:
            return []  # Unknown movie – graceful fallback.

        # Retrieve similarity scores, exclude the movie itself, sort descending.
        scores = self.movie_similarity_df[movie_id].drop(labels=[movie_id])
        top_n = scores.sort_values(ascending=False).head(n)

        results = []
        for mid, score in top_n.items():
            movie_row = self.movies.loc[self.movies['movieId'] == mid].iloc[0]
            results.append({
                'movie_id': int(mid),
                'title': movie_row['title'],
                'genres': movie_row['genres'],
                'score': round(float(score), 4)
            })
        return results

    def hybrid_recommend(self, user_id: int, n: int = 10):
        """
        Blend content‑based signals from the user’s top‑rated movies
        with a popularity‑based cold‑start signal.
        """
        user_ratings = self.ratings[self.ratings['userId'] == user_id]

        # Cold start: no rating history → return popular items.
        if user_ratings.empty:
            return self.popular_recommendations(n)

        watched = set(user_ratings['movieId'])

        # ---- Content‑based component ----
        cb_scores = {}
        # Take the user’s top‑5 rated movies as proxies for taste.
        favorites = user_ratings.sort_values('rating', ascending=False).head(5)
        for _, row in favorites.iterrows():
            mid = row['movieId']
            if mid in self.movie_similarity_df.index:
                # Similar movies to each favorite.
                similar = self.movie_similarity_df[mid].head(20)
                for sim_id, sim_score in similar.items():
                    if sim_id not in watched:
                        cb_scores[sim_id] = cb_scores.get(sim_id, 0) + sim_score * 0.3

        # ---- Popularity component ----
        pop_scores = {}
        max_pop = max(self.popularity.values()) if self.popularity else 1
        for mid, pop in self.popularity.items():
            if mid not in watched:
                # Normalize popularity to [0,1] then weight.
                pop_scores[mid] = (pop / max_pop) * 0.4

        # ---- Fusion ----
        final_scores = {}
        all_candidates = set(cb_scores.keys()) | set(pop_scores.keys())
        for mid in all_candidates:
            final_scores[mid] = cb_scores.get(mid, 0) + pop_scores.get(mid, 0)

        # Pick top‑n.
        top_movies = sorted(final_scores.items(),
                            key=lambda x: x[1],
                            reverse=True)[:n]

        results = []
        for mid, score in top_movies:
            movie_row = self.movies.loc[self.movies['movieId'] == mid].iloc[0]
            results.append({
                'movie_id': int(mid),
                'title': movie_row['title'],
                'genres': movie_row['genres'],
                'score': round(float(score), 4)
            })
        return results

    def popular_recommendations(self, n: int = 10):
        """
        Fallback for new users or when hybrid yields insufficient candidates.
        Returns movies with at least 10 ratings, scored by count * avg.
        """
        movie_stats = self.ratings.groupby('movieId').agg(
            count=('rating', 'count'),
            avg=('rating', 'mean')
        )
        # Filter out sparsely rated items.
        movie_stats = movie_stats[movie_stats['count'] >= 10]
        movie_stats['score'] = movie_stats['count'] * movie_stats['avg']
        top_movies = movie_stats.sort_values('score', ascending=False).head(n)

        results = []
        for mid, row in top_movies.iterrows():
            movie_row = self.movies.loc[self.movies['movieId'] == mid].iloc[0]
            results.append({
                'movie_id': int(mid),
                'title': movie_row['title'],
                'genres': movie_row['genres'],
                'avg_rating': round(float(row['avg']), 2),
                'rating_count': int(row['count'])
            })
        return results

Vibe‑Coding Prompt for the Engine

“Create a Python class called RecommendationEngine that loads movies.csv and ratings.csv, builds a genre one‑hot matrix, computes cosine similarity, calculates popularity scores, and provides three methods: content_based(movie_id, n), hybrid_recommend(user_id, n), and popular_recommendations(n). Include detailed docstrings, input validation, and logging of initialization stats.”

2. Building the FastAPI Service

What – A lightweight, async‑capable web framework that automatically generates OpenAPI documentation, validates request/response models via Pydantic, and lets us expose the engine’s methods as HTTP endpoints.

Why – FastAPI offers high performance (thanks to Starlette and Pydantic), automatic docs for easy integration, and built‑in dependency injection, which simplifies testing and deployment.

How – We create an app object, instantiate the engine at startup, define Pydantic models for request/response payloads, and implement the endpoints. We also add a simple in‑memory cache with TTL to reduce redundant computation.

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List
import time

app = FastAPI(
    title="Recommendation System API",
    description="Production‑grade API serving personalized, popular, and similar‑item recommendations.",
    version="1.0.0"
)

# Initialize the recommendation engine once when the service starts.
engine = RecommendationEngine()

# Simple in‑memory cache: {cache_key: (payload, timestamp)}
cache = {}
CACHE_TTL_SECONDS = 300  # 5 minutes

# ------------------- Pydantic Models -------------------
class RecommendResponse(BaseModel):
    recommendations: List[dict]
    cached: bool
    processing_time_ms: float

class PopularResponse(BaseModel):
    recommendations: List[dict]

class SimilarResponse(BaseModel):
    movie_id: int
    similar_movies: List[dict]

# ------------------- Startup event handlers
@app.on_event("startup")
async def startup_event():
    """Log service start‑out the docs URL."""
    print("🚀 Recommendation System API started.")
    print("📖 Interactive docs: http://localhost:8000/docs")
    print("🔍 Alternative docs: http://localhost:8000/redoc")
    print("💡 Test endpoint: http://localhost:8000/recommend/1?n=5")

@app.get("/")
def root():
    """Service landing page with a summary of available endpoints."""
    return {
        "service": "Recommendation System API",
        "version": "1.0.0",
        "endpoints": {
            "/recommend/{user_id}": "GET – personalized hybrid recommendations",
            "/popular": "GET – popularity‑based fallback (cold start)",
            "/similar/{movie_id}": "GET – content‑based similar items",
            "/health": "GET – service health and dataset statistics"
        }
    }

@app.get("/health", tags=["monitoring"])
def health_check():
    """Return basic health metrics; useful for Kubernetes liveness/readiness probes."""
    return {
        "status": "healthy",
        "movies_loaded": len(engine.movies),
        "users_loaded": engine.ratings['userId'].nunique(),
        "ratings_loaded": len(engine.ratings),
        "cache_size": len(cache)
    }

@app.get("/recommend/{user_id}", response_model=RecommendResponse, tags=["recommendations"])
def recommend(user_id: int, n: int = 10):
    """
    Hybrid recommendation endpoint.
    - Checks an in‑memory cache first.
    - If cache miss, calls engine.hybrid_recommend.
    - Stores result in cache with a TTL.
    """
    cache_key = f"recommend:{user_id}:{n}"
    now = time.time()

    # Cache hit?
    if cache_key in cache:
        payload, timestamp = cache[cache_key]
        if now - timestamp < CACHE_TTL_SECONDS:
            return {
                "recommendations": payload,
                "cached": True,
                "processing_time_ms": 0.0
            }

    # Cache miss → compute.
    start = time.perf_counter()
    recommendations = engine.hybrid_recommend(user_id, n)
    elapsed_ms = (time.perf_counter() - start) * 1000

    # Store in cache.
    cache[cache_key] = (recommendations, now)

    return {
        "recommendations": recommendations,
        "cached": False,
        "processing_time_ms": round(elapsed_ms, 2)
    }

@app.get("/popular", response_model=PopularResponse, tags=["recommendations"])
def popular(n: int = 10):
    """Return the top‑n popular movies (cold‑start fallback)."""
    return {
        "recommendations": engine.popular_recommendations(n)
    }

@app.get("/similar/{movie_id}", response_model=SimilarResponse, tags=["similarity"])
def similar_movies(movie_id: int, n: int = 10):
    """
    Return movies similar to the given movie_id using content‑based filtering.
    Raises 404 if the movie_id does not exist in the catalog.
    """
    results = engine.content_based(movie_id, n)
    if not results:
        raise HTTPException(
            status_code=404,
            detail=f"Movie ID {movie_id}