Graph Theory Basics & NetworkX
What is a Graph?
A graph consists of:
- Nodes (Vertices): entities (cities, people, web pages)
- Edges: relationships (roads, friendships, links)
Real-life Examples
| Scenario | Nodes | Edges | |----------|-------|-------| | Map Navigation | Intersections | Roads | | Social Network | People | Friendships | | Network Routing | Servers | Cables | | Recommendation | Users/Items | Purchases |
Graph Types
- Directed Graph: edges have direction
- Undirected Graph: edges are bidirectional
- Weighted Graph: edges have weights (distance, time)
- Connected vs Disconnected
Install NetworkX
pip install networkx matplotlib numpy
Your First Graph
import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_node("Taipei")
G.add_node("Taichung")
G.add_node("Kaohsiung")
G.add_edge("Taipei", "Taichung", weight=180)
G.add_edge("Taichung", "Kaohsiung", weight=200)
pos = nx.spring_layout(G, seed=42)
nx.draw(G, pos, with_labels=True, node_color='lightblue', node_size=1500)
labels = nx.get_edge_attributes(G, 'weight')
nx.draw_networkx_edge_labels(G, pos, edge_labels=labels)
plt.show()
๐ฅ Vibe Prompt
"Analyze a social network graph: create 50 nodes, compute degree/betweenness centrality, find the top 5 most important nodes, visualize with color intensity."
Common Issues & Solutions
| Problem | Cause | Solution | |---------|-------|----------| | Unexpected results | Wrong parameters | Check defaults and edge cases | | Slow execution | Inefficient algorithm | Use better data structures | | Out of memory | Too much data | Use batch processing | | Hard to debug | No logging | Add detailed logging |
Further Learning
- Read official documentation
- Browse open-source examples on GitHub
- Join community discussions
- Practice by modifying code and observing results
Performance Considerations
When working with large datasets or complex computations:
- Time Complexity: Analyze and optimize Big O
- Space Complexity: Balance memory vs speed
- Caching: Store computed results to avoid recalculation
- Parallelism: Use multi-threading for independent tasks
- Profiling: Measure before optimizing - use profilers
Real-World Application
This concept is widely used in:
- Web development (routing, authentication)
- Data science (feature engineering, model training)
- Game development (pathfinding, physics)
- Mobile apps (state management, caching)
Key Algorithms Summary
| Algorithm | Time Complexity | Space Complexity | Use Case | |-----------|----------------|-----------------|----------| | Dijkstra | O((V+E)logV) | O(V) | Shortest path | | A* | O(E) | O(V) | Pathfinding with heuristic | | BFS | O(V+E) | O(V) | Unweighted shortest path | | DFS | O(V+E) | O(V) | Topological sort, cycle detection |
Choosing the Right Algorithm
- Unweighted graph โ BFS
- Weighted, non-negative โ Dijkstra
- Weighted, any โ Bellman-Ford
- Heuristic available โ A*
- All pairs โ Floyd-Warshall
Dijkstra's Algorithm
Dijkstra finds the shortest path from a single source node to all other nodes:
import heapq
def dijkstra(graph, start):
"""
Find shortest distances from start to all nodes using Dijkstra.
Args:
graph: dict where graph[u] = [(v, weight), ...]
start: starting node
Returns:
distances dict, predecessors dict
"""
distances = {node: float('inf') for node in graph}
distances[start] = 0
pq = [(0, start)]
predecessors = {}
visited = set()
while pq:
current_dist, current = heapq.heappop(pq)
if current in visited:
continue
visited.add(current)
for neighbor, weight in graph[current]:
distance = current_dist + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
predecessors[neighbor] = current
heapq.heappush(pq, (distance, neighbor))
return distances, predecessors
A* Search Algorithm
A* combines actual cost from start (g) with estimated cost to goal (h) for efficient pathfinding:
def astar(graph, start, goal, heuristic):
"""
Find shortest path from start to goal using A*.
Args:
graph: adjacency list
start: starting node
goal: target node
heuristic: function h(node, goal) -> estimated cost
Returns:
path as list of nodes, or None
"""
open_set = [(0, start)] # (f_score, node)
g_score = {node: float('inf') for node in graph}
g_score[start] = 0
f_score = {node: float('inf') for node in graph}
f_score[start] = heuristic(start, goal)
came_from = {}
closed_set = set()
while open_set:
_, current = heapq.heappop(open_set)
if current == goal:
# Reconstruct path
path = [current]
while current in came_from:
current = came_from[current]
path.append(current)
path.reverse()
return path
if current in closed_set:
continue
closed_set.add(current)
for neighbor, weight in graph[current]:
tentative_g = g_score[current] + weight
if tentative_g < g_score[neighbor]:
came_from[neighbor] = current
g_score[neighbor] = tentative_g
f_score[neighbor] = tentative_g + heuristic(neighbor, goal)
heapq.heappush(open_set, (f_score[neighbor], neighbor))
return None # No path found
# Example: Manhattan heuristic for grid
heuristic = lambda a, b: abs(a[0]-b[0]) + abs(a[1]-b[1])
Summary
Graph theory is the foundation of network analysis, pathfinding, and optimization. Understand node/edge/weight concepts, use adjacency lists for efficiency, choose between DFS/BFS based on your traversal needs, and apply Dijkstra or A* for pathfinding depending on whether you have a goal heuristic.
Key takeaways:
- Graphs = nodes (vertices) + edges (connections) + optional weights
- Adjacency list: O(V + E) memory โ best for sparse graphs
- Adjacency matrix: O(Vยฒ) memory โ best for dense graphs
- BFS: shortest path in unweighted graphs, level-order traversal
- DFS: topological sort, cycle detection, maze solving
- Dijkstra: shortest paths from one source to all nodes (weighted, non-negative)
- A*: goal-directed pathfinding with heuristic (faster than Dijkstra)
- NetworkX: Python library for graph creation, analysis, and visualization
What's Next: Algorithm โ Dynamic Programming
The next course covers dynamic programming โ breaking problems into overlapping subproblems and solving each once for efficient results.