Spaces:
Running
Running
fix: Fixed missing visualization canvases, repaired ML Lab link, and fixed MathJax tab rendering logic.
84b67b2 | <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8" /> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> | |
| <title>Data Visualization Masterclass</title> | |
| <link rel="stylesheet" href="style.css" /> | |
| <!-- MathJax for rendering LaTeX formulas --> | |
| <script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script> | |
| <script> | |
| MathJax = { | |
| tex: { | |
| inlineMath: [['$', '$'], ['\\(', '\\)']] | |
| } | |
| }; | |
| </script> | |
| <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script> | |
| </head> | |
| <body> | |
| <div class="app flex"> | |
| <!-- Sidebar Navigation --> | |
| <aside class="sidebar" id="sidebar"> | |
| <h1 class="sidebar__title">π Data Visualization</h1> | |
| <nav> | |
| <h3 class="sidebar__section">Foundations</h3> | |
| <ul class="nav__list" id="navList"> | |
| <li><a href="#intro" class="nav__link">π― Why Visualize Data?</a></li> | |
| <li><a href="#perception" class="nav__link">ποΈ Visual Perception</a></li> | |
| <li><a href="#grammar" class="nav__link">π Grammar of Graphics</a></li> | |
| <li><a href="#choosing-charts" class="nav__link">π¨ Choosing the Right Chart</a></li> | |
| </ul> | |
| <h3 class="sidebar__section">Matplotlib Essentials</h3> | |
| <ul class="nav__list"> | |
| <li><a href="#matplotlib-anatomy" class="nav__link">π¬ Figure Anatomy</a></li> | |
| <li><a href="#basic-plots" class="nav__link">π Basic Plots</a></li> | |
| <li><a href="#subplots" class="nav__link">π² Subplots & Layouts</a></li> | |
| <li><a href="#styling" class="nav__link">π¨ Styling & Themes</a></li> | |
| </ul> | |
| <h3 class="sidebar__section">Seaborn Statistical Viz</h3> | |
| <ul class="nav__list"> | |
| <li><a href="#seaborn-intro" class="nav__link">π Seaborn Overview</a></li> | |
| <li><a href="#distributions" class="nav__link">π Distribution Plots</a></li> | |
| <li><a href="#relationships" class="nav__link">π Relationship Plots</a></li> | |
| <li><a href="#categorical" class="nav__link">π¦ Categorical Plots</a></li> | |
| <li><a href="#heatmaps" class="nav__link">π₯ Heatmaps & Clusters</a></li> | |
| </ul> | |
| <h3 class="sidebar__section">Interactive Visualization</h3> | |
| <ul class="nav__list"> | |
| <li><a href="#plotly" class="nav__link">π Plotly Express</a></li> | |
| <li><a href="#animations" class="nav__link">π¬ Animations</a></li> | |
| <li><a href="#dashboards" class="nav__link">π± Dashboards (Streamlit)</a></li> | |
| </ul> | |
| <h3 class="sidebar__section">Advanced Topics</h3> | |
| <ul class="nav__list"> | |
| <li><a href="#geospatial" class="nav__link">πΊοΈ Geospatial Viz</a></li> | |
| <li><a href="#3d-plots" class="nav__link">π² 3D Visualization</a></li> | |
| <li><a href="#storytelling" class="nav__link">π Data Storytelling</a></li> | |
| </ul> | |
| </nav> | |
| </aside> | |
| <!-- Main Content --> | |
| <main class="content" id="content"> | |
| <!-- ============================ 1. INTRO ============================ --> | |
| <section id="intro" class="topic-section"> | |
| <h2>π― Why Visualize Data?</h2> | |
| <p>Data visualization transforms abstract numbers into visual stories. The human brain processes images 60,000Γ | |
| faster than text. Visualization helps us <strong>explore</strong>, <strong>analyze</strong>, and | |
| <strong>communicate</strong> data effectively. | |
| </p> | |
| <div class="info-card"> | |
| <strong>Anscombe's Quartet:</strong> Four datasets with nearly identical statistical properties (mean, | |
| variance, correlation) that look completely different when plotted. This demonstrates why visualization is | |
| essential - statistics alone can be misleading! | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-anscombe" width="800" height="400"></canvas> | |
| </div> | |
| <h3>Three Purposes of Visualization</h3> | |
| <div class="info-card"> | |
| <strong>1. Exploratory:</strong> Discover patterns, anomalies, and insights in your data<br> | |
| <strong>2. Explanatory:</strong> Communicate findings to stakeholders clearly<br> | |
| <strong>3. Confirmatory:</strong> Verify hypotheses and validate models | |
| </div> | |
| <div class="callout callout--insight">π‘ "The greatest value of a picture is when it forces us to notice what we | |
| never expected to see." β John Tukey</div> | |
| <div class="callout callout--tip">β Always start with visualization before building ML models.</div> | |
| </section> | |
| <!-- ====================== 2. VISUAL PERCEPTION ================== --> | |
| <section id="perception" class="topic-section"> | |
| <h2>ποΈ Visual Perception & Pre-attentive Attributes</h2> | |
| <p>The human visual system can detect certain visual attributes almost instantly (< 250ms) without conscious | |
| effort. These are called <strong>pre-attentive attributes</strong>.</p> | |
| <div class="info-card"> | |
| <strong>Pre-attentive Attributes:</strong> | |
| <ul> | |
| <li><strong>Position:</strong> Most accurate for quantitative data (use X/Y axes)</li> | |
| <li><strong>Length:</strong> Bar charts leverage this effectively</li> | |
| <li><strong>Color Hue:</strong> Best for categorical distinctions</li> | |
| <li><strong>Color Intensity:</strong> Good for gradients/magnitude</li> | |
| <li><strong>Size:</strong> Bubble charts, but humans underestimate area</li> | |
| <li><strong>Shape:</strong> Useful for categories, but limit to 5-7 shapes</li> | |
| <li><strong>Orientation:</strong> Lines, angles</li> | |
| </ul> | |
| </div> | |
| <div class="form-group"> | |
| <button id="btn-position" class="btn btn--primary">Position Encoding</button> | |
| <button id="btn-color" class="btn btn--primary">Color Encoding</button> | |
| <button id="btn-size" class="btn btn--primary">Size Encoding</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-perception" width="700" height="350"></canvas> | |
| </div> | |
| <h3>Cleveland & McGill's Accuracy Ranking</h3> | |
| <div class="info-card"> | |
| <strong>Most Accurate β Least Accurate:</strong><br> | |
| 1. Position on common scale (bar chart)<br> | |
| 2. Position on non-aligned scale (multiple axes)<br> | |
| 3. Length (bar)<br> | |
| 4. Angle, Slope<br> | |
| 5. Area<br> | |
| 6. Volume, Curvature<br> | |
| 7. Color saturation, Color hue | |
| </div> | |
| <div class="callout callout--mistake">β οΈ Pie charts use angle (low accuracy). Bar charts are almost always | |
| better!</div> | |
| <div class="callout callout--tip">β Use position for most important data, color for categories.</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: The Weber-Fechner Law</h3> | |
| <p>Why are humans bad at comparing bubble sizes (area) but great at comparing bar chart heights | |
| (length/position)? Human perception of physical magnitudes follows a logarithmic scale, not a linear one. | |
| </p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ \frac{\Delta I}{I} = k \quad \Rightarrow \quad S = c \ln\left(\frac{I}{I_0}\right) $$ | |
| </div> | |
| <ul style="margin-bottom: 0;"> | |
| <li><strong>$I$</strong>: Initial stimulus intensity (e.g., initial bubble area)</li> | |
| <li><strong>$\Delta I$</strong>: Just Noticeable Difference (JND) required to perceive a change</li> | |
| <li><strong>$k$</strong>: Weber's constant. For length/position $k \approx 0.03$ (very sensitive), but for | |
| area $k \approx 0.10$ to $0.20$ (very insensitive).</li> | |
| </ul> | |
| </div> | |
| </section> | |
| <!-- ====================== 3. GRAMMAR OF GRAPHICS ================== --> | |
| <section id="grammar" class="topic-section"> | |
| <h2>π The Grammar of Graphics</h2> | |
| <p>The Grammar of Graphics (Wilkinson, 1999) is a framework for describing statistical graphics. It's the | |
| foundation of ggplot2 (R) and influences Seaborn, Altair, and Plotly.</p> | |
| <div class="info-card"> | |
| <strong>Components of a Graphic:</strong> | |
| <ul> | |
| <li><strong>Data:</strong> The dataset being visualized</li> | |
| <li><strong>Aesthetics (aes):</strong> Mapping data to visual properties (x, y, color, size)</li> | |
| <li><strong>Geometries (geom):</strong> Visual elements (points, lines, bars, areas)</li> | |
| <li><strong>Facets:</strong> Subplots by categorical variable</li> | |
| <li><strong>Statistics:</strong> Transformations (binning, smoothing, aggregation)</li> | |
| <li><strong>Coordinates:</strong> Cartesian, polar, map projections</li> | |
| <li><strong>Themes:</strong> Non-data visual elements (fonts, backgrounds)</li> | |
| </ul> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-grammar" width="800" height="400"></canvas> | |
| </div> | |
| <div class="callout callout--insight">π‘ Understanding Grammar of Graphics makes you a better visualizer in ANY | |
| library.</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Coordinate Transformations</h3> | |
| <p>When mapping data to visuals, the coordinate system applies a mathematical transformation matrix. For | |
| example, converting standard Cartesian coordinates $(x, y)$ to Polar coordinates $(r, \theta)$ to render a | |
| pie chart or Coxcomb plot:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; overflow-x: auto; color: #e4e6eb;"> | |
| $$ r = \sqrt{x^2 + y^2} $$ | |
| $$ \theta = \text{atan2}(y, x) $$ | |
| $$ \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} r \cos(\theta) \\ r \sin(\theta) \end{bmatrix} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">This is why pie charts are computationally and perceptually different from bar | |
| chartsβthey apply a non-linear polar transformation to the linear data dimensions.</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>app.py - Grammar of Graphics with Plotnine (Python)</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import pandas as pd | |
| from plotnine import * | |
| # Following Grammar of Graphics exactly: | |
| # Data (mpg) -> Aesthetics (x,y,color) -> Geometries (point, smooth) | |
| plot = ( | |
| ggplot(mpg, aes(x='displ', y='hwy', color='class')) | |
| + geom_point(size=3, alpha=0.7) | |
| + geom_smooth(method='lm', se=False) # Add regression line | |
| + theme_minimal() # Add theme | |
| + labs(title='Engine Displacement vs Highway MPG') | |
| ) | |
| print(plot)</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 4. CHOOSING CHARTS ================== --> | |
| <section id="choosing-charts" class="topic-section"> | |
| <h2>π¨ Choosing the Right Chart</h2> | |
| <p>The best visualization depends on your <strong>data type</strong> and <strong>question</strong>. Here's a | |
| decision guide:</p> | |
| <div class="info-card"> | |
| <strong>Single Variable (Univariate):</strong><br> | |
| β’ Continuous: Histogram, KDE, Box plot, Violin plot<br> | |
| β’ Categorical: Bar chart, Count plot<br><br> | |
| <strong>Two Variables (Bivariate):</strong><br> | |
| β’ Both Continuous: Scatter plot, Line chart, Hexbin, 2D histogram<br> | |
| β’ Continuous + Categorical: Box plot, Violin, Strip, Swarm<br> | |
| β’ Both Categorical: Heatmap, Grouped bar chart<br><br> | |
| <strong>Multiple Variables (Multivariate):</strong><br> | |
| β’ Pair plot (scatterplot matrix)<br> | |
| β’ Parallel coordinates<br> | |
| β’ Heatmap correlation matrix<br> | |
| β’ Faceted plots (small multiples) | |
| </div> | |
| <div class="form-group"> | |
| <button id="btn-comparison" class="btn btn--primary">Comparison</button> | |
| <button id="btn-composition" class="btn btn--primary">Composition</button> | |
| <button id="btn-distribution" class="btn btn--primary">Distribution</button> | |
| <button id="btn-relationship" class="btn btn--primary">Relationship</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-choosing" width="800" height="400"></canvas> | |
| </div> | |
| <h3>Common Chart Mistakes</h3> | |
| <div class="callout callout--mistake">β οΈ <strong>Pie charts for many categories</strong> - Use bar chart instead | |
| </div> | |
| <div class="callout callout--mistake">β οΈ <strong>3D effects on 2D data</strong> - Distorts perception</div> | |
| <div class="callout callout--mistake">β οΈ <strong>Truncated Y-axis</strong> - Exaggerates differences</div> | |
| <div class="callout callout--mistake">β οΈ <strong>Rainbow color scales</strong> - Not perceptually uniform</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Information Entropy in Visuals</h3> | |
| <p>How much data can a chart "handle" before it becomes cluttered? We can use Shannon Entropy ($H$) to | |
| quantify the visual information density. If a chart has $n$ visual marks (dots, lines) with probabilities | |
| $p_i$ of drawing attention:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; color: #e4e6eb;"> | |
| $$ H(X) = - \sum_{i=1}^{n} p_i \log_2(p_i) $$ | |
| </div> | |
| <p style="margin-bottom: 0;"><strong>Takeaway:</strong> If you add too many dimensions (color, size, shape | |
| simultaneously) on a single plot, the entropy $H$ exceeds human working memory limits ($\approx 2.5$ bits), | |
| leading to chart fatigue. This is mathematically why "less is more" in dashboard design.</p> | |
| </div> | |
| </section> | |
| <!-- ====================== 5. MATPLOTLIB ANATOMY ================== --> | |
| <section id="matplotlib-anatomy" class="topic-section"> | |
| <h2>π¬ Matplotlib Figure Anatomy</h2> | |
| <p>Understanding Matplotlib's object hierarchy is key to creating professional visualizations.</p> | |
| <div class="info-card"> | |
| <strong>Hierarchical Structure:</strong><br> | |
| <code>Figure β Axes β Axis β Tick β Label</code><br><br> | |
| β’ <strong>Figure:</strong> The overall window/canvas<br> | |
| β’ <strong>Axes:</strong> The actual plot area (NOT the X/Y axis!)<br> | |
| β’ <strong>Axis:</strong> The X or Y axis with ticks and labels<br> | |
| β’ <strong>Artist:</strong> Everything visible (lines, text, patches) | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-anatomy" width="800" height="500"></canvas> | |
| </div> | |
| <h3>Two Interfaces</h3> | |
| <div class="info-card"> | |
| <strong>1. pyplot (MATLAB-style):</strong> Quick, implicit state<br> | |
| <code>plt.plot(x, y)</code><br> | |
| <code>plt.xlabel('Time')</code><br> | |
| <code>plt.show()</code><br><br> | |
| <strong>2. Object-Oriented (OO):</strong> Explicit, recommended for complex plots<br> | |
| <code>fig, ax = plt.subplots()</code><br> | |
| <code>ax.plot(x, y)</code><br> | |
| <code>ax.set_xlabel('Time')</code> | |
| </div> | |
| <div class="callout callout--tip">β Always use OO interface for publication-quality plots.</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Affine Transformations</h3> | |
| <p>How does Matplotlib convert your data coordinates (e.g., $x \in [0, 1000]$) into physical pixels on your | |
| screen? It uses a continuous pipeline of <strong>Affine Transformation Matrices</strong>:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; overflow-x: auto; color: #e4e6eb;"> | |
| $$ \begin{bmatrix} x_{\text{display}} \\ y_{\text{display}} \\ 1 \end{bmatrix} = \begin{bmatrix} s_x & 0 & | |
| t_x \\ 0 & s_y & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x_{\text{data}} \\ y_{\text{data}} \\ 1 | |
| \end{bmatrix} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">This matrix $T$ scales ($s_x, s_y$) and translates ($t_x, t_y$) data points. The | |
| transformation pipeline is: Data $\rightarrow$ Axes (relative 0-1) $\rightarrow$ Figure (inches) | |
| $\rightarrow$ Display (pixels based on DPI).</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>plot.py - Matplotlib Object-Oriented Setup</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import matplotlib.pyplot as plt | |
| # 1. Create the Figure (The Canvas) and Axes (The Artist) | |
| fig, ax = plt.subplots(figsize=(10, 6), dpi=100) | |
| # 2. Draw on the Axes | |
| ax.plot([1, 2, 3], [4, 5, 2], marker='o', label='Data A') | |
| # 3. Configure the Axes (Anatomy elements) | |
| ax.set_title("My First OOP Plot", fontsize=16, fontweight='bold') | |
| ax.set_xlabel("X-Axis (Units)", fontsize=12) | |
| ax.set_ylabel("Y-Axis (Units)", fontsize=12) | |
| # Set limits and ticks | |
| ax.set_xlim(0, 4) | |
| ax.set_ylim(0, 6) | |
| ax.grid(True, linestyle='--', alpha=0.7) | |
| # 4. Add accessories | |
| ax.legend(loc='upper right') | |
| # 5. Render or Save | |
| plt.tight_layout() # Prevent clipping | |
| plt.show() | |
| # fig.savefig('my_plot.png', dpi=300)</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 6. BASIC PLOTS ================== --> | |
| <section id="basic-plots" class="topic-section"> | |
| <h2>π Basic Matplotlib Plots</h2> | |
| <p>Master the fundamental plot types that form the foundation of data visualization.</p> | |
| <div class="form-group"> | |
| <button id="btn-line" class="btn btn--primary">Line Plot</button> | |
| <button id="btn-scatter" class="btn btn--primary">Scatter Plot</button> | |
| <button id="btn-bar" class="btn btn--primary">Bar Chart</button> | |
| <button id="btn-hist" class="btn btn--primary">Histogram</button> | |
| <button id="btn-pie" class="btn btn--primary">Pie Chart</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-basic" width="800" height="400"></canvas> | |
| </div> | |
| <h3>Code Examples</h3> | |
| <div class="info-card"> | |
| <strong>Line Plot:</strong><br> | |
| <code>ax.plot(x, y, color='blue', linestyle='--', marker='o', label='Series A')</code><br><br> | |
| <strong>Scatter Plot:</strong><br> | |
| <code>ax.scatter(x, y, c=colors, s=sizes, alpha=0.7, cmap='viridis')</code><br><br> | |
| <strong>Bar Chart:</strong><br> | |
| <code>ax.bar(categories, values, color='steelblue', edgecolor='black')</code><br><br> | |
| <strong>Histogram:</strong><br> | |
| <code>ax.hist(data, bins=30, edgecolor='white', density=True)</code> | |
| </div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: The Freedman-Diaconis Rule</h3> | |
| <p>When you call a histogram without specifying bins, how does the library choose the optimal bin width? | |
| Advanced statistical libraries use the Freedman-Diaconis rule, which minimizes the integral of the squared | |
| difference between the histogram and the true underlying probability density:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ \text{Bin Width } (h) = 2 \frac{\text{IQR}(x)}{\sqrt[3]{n}} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">Where $\text{IQR}$ is the Interquartile Range and $n$ is the number of | |
| observations. Unlike simpler rules (e.g., Sturges' rule), this mathematical method is extremely robust to | |
| heavy-tailed distributions and outliers.</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>basic_plots.py - Common Matplotlib Patterns</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import matplotlib.pyplot as plt | |
| import numpy as np | |
| fig, axs = plt.subplots(1, 2, figsize=(15, 5)) | |
| # 1. Scatter Plot (Color & Size mapping) | |
| x = np.random.randn(100) | |
| y = x + np.random.randn(100)*0.5 | |
| sizes = np.random.uniform(10, 200, 100) | |
| colors = x | |
| sc = axs[0].scatter(x, y, s=sizes, c=colors, cmap='viridis', alpha=0.7) | |
| axs[0].set_title('Scatter Plot') | |
| fig.colorbar(sc, ax=axs[0], label='Color Value') | |
| # 2. Bar Chart (with Error Bars) | |
| categories = ['Group A', 'Group B', 'Group C'] | |
| values = [10, 22, 15] | |
| errors = [1.5, 3.0, 2.0] | |
| axs[1].bar(categories, values, yerr=errors, capsize=5, color='coral', alpha=0.8) | |
| axs[1].set_title('Bar Chart with Error Bars') | |
| for i, v in enumerate(values): | |
| axs[1].text(i, v + 0.5, str(v), ha='center') | |
| plt.tight_layout() | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 7. SUBPLOTS ================== --> | |
| <section id="subplots" class="topic-section"> | |
| <h2>π² Subplots & Multi-panel Layouts</h2> | |
| <p>Combine multiple visualizations into a single figure for comprehensive analysis.</p> | |
| <div class="form-group"> | |
| <button id="btn-grid-2x2" class="btn btn--primary">2Γ2 Grid</button> | |
| <button id="btn-grid-uneven" class="btn btn--primary">Uneven Grid</button> | |
| <button id="btn-gridspec" class="btn btn--primary">GridSpec</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-subplots" width="800" height="500"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Methods:</strong><br> | |
| <code>fig, axes = plt.subplots(2, 2, figsize=(12, 10))</code><br> | |
| <code>fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)</code><br> | |
| <code>gs = fig.add_gridspec(3, 3); ax = fig.add_subplot(gs[0, :])</code> | |
| </div> | |
| <div class="callout callout--tip">β Use plt.tight_layout() or fig.set_constrained_layout(True) to prevent | |
| overlaps.</div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>subplots.py - Complex Layouts with GridSpec</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import matplotlib.pyplot as plt | |
| import matplotlib.gridspec as gridspec | |
| fig = plt.figure(figsize=(10, 8)) | |
| gs = gridspec.GridSpec(3, 3, figure=fig) | |
| # 1. Main large plot (spans 2x2 grid) | |
| ax_main = fig.add_subplot(gs[0:2, 0:2]) | |
| ax_main.set_title('Main View') | |
| # 2. Side plots (Top right, Bottom right) | |
| ax_side1 = fig.add_subplot(gs[0, 2]) | |
| ax_side2 = fig.add_subplot(gs[1, 2]) | |
| # 3. Bottom wide plot (spans 1x3 grid) | |
| ax_bottom = fig.add_subplot(gs[2, :]) | |
| ax_bottom.set_title('Timeline View') | |
| plt.tight_layout() | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 8. STYLING ================== --> | |
| <section id="styling" class="topic-section"> | |
| <h2>π¨ Styling & Professional Themes</h2> | |
| <p>Transform basic plots into publication-quality visualizations.</p> | |
| <div class="form-group"> | |
| <button id="btn-style-default" class="btn btn--primary">Default</button> | |
| <button id="btn-style-seaborn" class="btn btn--primary">Seaborn</button> | |
| <button id="btn-style-ggplot" class="btn btn--primary">ggplot</button> | |
| <button id="btn-style-dark" class="btn btn--primary">Dark Background</button> | |
| <button id="btn-style-538" class="btn btn--primary">FiveThirtyEight</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-styling" width="800" height="400"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Available Styles:</strong><br> | |
| <code>plt.style.available</code> β Lists all built-in styles<br> | |
| <code>plt.style.use('seaborn-v0_8-whitegrid')</code><br> | |
| <code>with plt.style.context('dark_background'):</code> | |
| </div> | |
| <h3>Color Palettes</h3> | |
| <div class="info-card"> | |
| <strong>Perceptually Uniform:</strong> viridis, plasma, inferno, magma, cividis<br> | |
| <strong>Sequential:</strong> Blues, Greens, Oranges (for magnitude)<br> | |
| <strong>Diverging:</strong> coolwarm, RdBu (for +/- deviations)<br> | |
| <strong>Categorical:</strong> tab10, Set2, Paired (discrete groups) | |
| </div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Perceptually Uniform Colors (CIELAB)</h3> | |
| <p>Why do we use "viridis" instead of "rainbow" colormaps? A color map is a mathematical function mapping data | |
| $f(x) \rightarrow (R, G, B)$. However, standard RGB math doesn't match human perception (Euclidean distance | |
| in RGB $\neq$ perceived color distance).</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ \Delta E^* = \sqrt{(\Delta L^*)^2 + (\Delta a^*)^2 + (\Delta b^*)^2} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">Advanced colormaps like <em>viridis</em> are calculated in the <strong>CIELAB | |
| ($L^*a^*b^*$) color space</strong>. In this space, the mathematical distance formula $\Delta E^*$ | |
| perfectly matches how the retina and brain perceive brightness and hue differences, ensuring data is never | |
| visually distorted.</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>styling.py - Applying Professional Aesthetics</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import matplotlib.pyplot as plt | |
| import seaborn as sns | |
| # 1. Apply a global Seaborn theme | |
| sns.set_theme(style="whitegrid", palette="muted") | |
| # 2. Customize fonts globally | |
| plt.rcParams.update({ | |
| 'font.family': 'sans-serif', | |
| 'font.sans-serif': ['Helvetica', 'Arial'], | |
| 'axes.titleweight': 'bold', | |
| 'axes.titlesize': 16, | |
| 'axes.labelsize': 12, | |
| 'lines.linewidth': 2 | |
| }) | |
| # 3. Plotting with the new theme | |
| fig, ax = plt.subplots(figsize=(8, 5)) | |
| ax.plot([1, 2, 3], [4, 5, 2], label='Data') | |
| ax.legend() | |
| # 4. Remove top and right spines (cleaner look) | |
| sns.despine(ax=ax) | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 9. SEABORN INTRO ================== --> | |
| <section id="seaborn-intro" class="topic-section"> | |
| <h2>π Seaborn: Statistical Visualization</h2> | |
| <p>Seaborn is a high-level library built on Matplotlib that makes statistical graphics beautiful and easy.</p> | |
| <div class="info-card"> | |
| <strong>Why Seaborn?</strong> | |
| <ul> | |
| <li>Beautiful default styles and color palettes</li> | |
| <li>Works seamlessly with Pandas DataFrames</li> | |
| <li>Statistical estimation built-in (confidence intervals, regression)</li> | |
| <li>Faceting for multi-panel figures</li> | |
| <li>Functions organized by plot purpose</li> | |
| </ul> | |
| </div> | |
| <h3>Seaborn Function Categories</h3> | |
| <div class="info-card"> | |
| <strong>Figure-level:</strong> Create entire figures (displot, relplot, catplot)<br> | |
| <strong>Axes-level:</strong> Draw on specific axes (histplot, scatterplot, boxplot)<br><br> | |
| <strong>By Purpose:</strong><br> | |
| β’ <strong>Distribution:</strong> histplot, kdeplot, ecdfplot, rugplot<br> | |
| β’ <strong>Relationship:</strong> scatterplot, lineplot, regplot<br> | |
| β’ <strong>Categorical:</strong> stripplot, swarmplot, boxplot, violinplot, barplot<br> | |
| β’ <strong>Matrix:</strong> heatmap, clustermap | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-seaborn-intro" width="800" height="400"></canvas> | |
| </div> | |
| </section> | |
| <!-- ====================== 10. DISTRIBUTIONS ================== --> | |
| <section id="distributions" class="topic-section"> | |
| <h2>π Distribution Plots</h2> | |
| <p>Visualize the distribution of a single variable or compare distributions across groups.</p> | |
| <div class="form-group"> | |
| <button id="btn-histplot" class="btn btn--primary">Histogram</button> | |
| <button id="btn-kdeplot" class="btn btn--primary">KDE Plot</button> | |
| <button id="btn-ecdfplot" class="btn btn--primary">ECDF</button> | |
| <button id="btn-rugplot" class="btn btn--primary">Rug Plot</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-distributions" width="800" height="400"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Histogram vs KDE:</strong><br> | |
| β’ Histogram: Discrete bins, shows raw counts<br> | |
| β’ KDE: Smooth curve, estimates probability density<br> | |
| β’ Use both together: <code>sns.histplot(data, kde=True)</code> | |
| </div> | |
| <div class="callout callout--insight">π‘ ECDF (Empirical Cumulative Distribution Function) avoids binning issues | |
| entirely.</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Kernel Density Estimation (KDE)</h3> | |
| <p>A KDE plot is not just a smoothed line; it's a mathematical sum of continuous probability distributions | |
| (kernels) placed at every single data point $x_i$:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ \hat{f}_h(x) = \frac{1}{n h} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right) $$ | |
| </div> | |
| <p style="margin-bottom: 0;">Here, $K$ is typically the Standard Normal Gaussian density function, and $h$ is | |
| the bandwidth parameter. If $h$ is too small, the curve is jagged (overfit); if $h$ is too large, it hides | |
| important statistical features (underfit).</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>distributions.py - Visualizing Distributions</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import seaborn as sns | |
| import matplotlib.pyplot as plt | |
| penguins = sns.load_dataset("penguins") | |
| fig, axes = plt.subplots(1, 2, figsize=(12, 5)) | |
| # 1. Histogram + KDE overlay | |
| sns.histplot( | |
| data=penguins, x="flipper_length_mm", hue="species", | |
| element="step", stat="density", common_norm=False, | |
| ax=axes[0] | |
| ) | |
| axes[0].set_title("Histogram with Step Fill") | |
| # 2. KDE Plot with Rug Plot | |
| sns.kdeplot( | |
| data=penguins, x="body_mass_g", hue="species", | |
| fill=True, common_norm=False, palette="crest", | |
| alpha=0.5, linewidth=1.5, ax=axes[1] | |
| ) | |
| sns.rugplot( | |
| data=penguins, x="body_mass_g", hue="species", | |
| height=0.05, ax=axes[1] | |
| ) | |
| axes[1].set_title("KDE Density + Rug Plot") | |
| sns.despine() | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 11. RELATIONSHIPS ================== --> | |
| <section id="relationships" class="topic-section"> | |
| <h2>π Relationship Plots</h2> | |
| <p>Explore relationships between two or more continuous variables.</p> | |
| <div class="form-group"> | |
| <button id="btn-scatter-hue" class="btn btn--primary">Scatter + Hue</button> | |
| <button id="btn-regplot" class="btn btn--primary">Regression Plot</button> | |
| <button id="btn-residplot" class="btn btn--primary">Residual Plot</button> | |
| <button id="btn-pairplot" class="btn btn--primary">Pair Plot</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-relationships" width="800" height="500"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Key Functions:</strong><br> | |
| <code>sns.scatterplot(data=df, x='x', y='y', hue='category', size='magnitude')</code><br> | |
| <code>sns.regplot(data=df, x='x', y='y', scatter_kws={'alpha':0.5})</code><br> | |
| <code>sns.pairplot(df, hue='species', diag_kind='kde')</code> | |
| </div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Ordinary Least Squares (OLS)</h3> | |
| <p>When you use <code>sns.regplot</code>, Seaborn calculates the line of best fit by minimizing the sum of the | |
| squared residuals ($e_i^2$). The exact matrix algebra closed-form solution for the coefficients | |
| $\hat{\boldsymbol{\beta}}$ is:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ \hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">The shaded region around the line represents the 95% confidence interval, meaning | |
| if we resampled the data 100 times, the true regression line would fall inside this shaded band 95 times | |
| (usually computed via bootstrapping).</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>relationships.py - Scatter and Regression</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import seaborn as sns | |
| import matplotlib.pyplot as plt | |
| tips = sns.load_dataset("tips") | |
| # 1. Advanced Scatter (4 dimensions: x, y, color, size) | |
| plt.figure(figsize=(8, 6)) | |
| sns.scatterplot( | |
| data=tips, x="total_bill", y="tip", | |
| hue="time", size="size", sizes=(20, 200), | |
| palette="deep", alpha=0.8 | |
| ) | |
| plt.title("4D Scatter Plot (Total Bill vs Tip)") | |
| plt.show() | |
| # 2. Regression Plot with Subplots (Using lmplot) | |
| # lmplot is a figure-level function that creates multiple subplots automatically | |
| sns.lmplot( | |
| data=tips, x="total_bill", y="tip", col="time", hue="smoker", | |
| height=5, aspect=1.2, scatter_kws={'alpha':0.5} | |
| ) | |
| plt.show() | |
| # 3. Pairplot (Explore all pairwise relationships) | |
| sns.pairplot( | |
| data=tips, hue="smoker", | |
| diag_kind="kde", markers=["o", "s"] | |
| ) | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 12. CATEGORICAL ================== --> | |
| <section id="categorical" class="topic-section"> | |
| <h2>π¦ Categorical Plots</h2> | |
| <p>Visualize distributions and comparisons across categorical groups.</p> | |
| <div class="form-group"> | |
| <button id="btn-stripplot" class="btn btn--primary">Strip Plot</button> | |
| <button id="btn-swarmplot" class="btn btn--primary">Swarm Plot</button> | |
| <button id="btn-boxplot" class="btn btn--primary">Box Plot</button> | |
| <button id="btn-violinplot" class="btn btn--primary">Violin Plot</button> | |
| <button id="btn-barplot" class="btn btn--primary">Bar Plot</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-categorical" width="800" height="400"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>When to Use:</strong><br> | |
| β’ <strong>Strip/Swarm:</strong> Show all data points (small datasets)<br> | |
| β’ <strong>Box:</strong> Summary statistics (median, quartiles, outliers)<br> | |
| β’ <strong>Violin:</strong> Full distribution shape + summary<br> | |
| β’ <strong>Bar:</strong> Mean/count with error bars | |
| </div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: The IQR Outlier Rule</h3> | |
| <p>Box plots identify "outliers" (the individual dots beyond the whiskers) purely mathematically, not | |
| visually. They use John Tukey's Interquartile Range (IQR) method:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; color: #e4e6eb;"> | |
| $$ \text{IQR} = Q_3 - Q_1 $$ | |
| $$ \text{Lower Fence} = Q_1 - 1.5 \times \text{IQR} $$ | |
| $$ \text{Upper Fence} = Q_3 + 1.5 \times \text{IQR} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">Any point strictly outside $[Lower, Upper]$ is plotted as an outlier. <strong>Fun | |
| Fact:</strong> In a perfectly normal Gaussian distribution $\mathcal{N}(\mu, \sigma^2)$, exactly 0.70% of | |
| the data will be incorrectly flagged as outliers by this static math rule!</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>categorical.py - Categories and Factor Variables</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import seaborn as sns | |
| import matplotlib.pyplot as plt | |
| tips = sns.load_dataset("tips") | |
| fig, axes = plt.subplots(1, 2, figsize=(14, 6)) | |
| # 1. Violin Plot (Distribution density across categories) | |
| sns.violinplot( | |
| data=tips, x="day", y="total_bill", hue="sex", | |
| split=True, inner="quart", palette="muted", | |
| ax=axes[0] | |
| ) | |
| axes[0].set_title("Violin Plot (Split by Sex)") | |
| # 2. Boxplot + Swarmplot Overlay | |
| # Good for showing summary stats PLUS underlying data points | |
| sns.boxplot( | |
| data=tips, x="day", y="total_bill", color="white", | |
| width=.5, showfliers=False, ax=axes[1] # hide boxplot outliers to avoid overlap | |
| ) | |
| sns.swarmplot( | |
| data=tips, x="day", y="total_bill", hue="time", | |
| size=6, alpha=0.7, ax=axes[1] | |
| ) | |
| axes[1].set_title("Boxplot + Swarmplot Overlay") | |
| plt.tight_layout() | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 13. HEATMAPS ================== --> | |
| <section id="heatmaps" class="topic-section"> | |
| <h2>π₯ Heatmaps & Correlation Matrices</h2> | |
| <p>Visualize matrices of values using color intensity. Essential for EDA correlation analysis.</p> | |
| <div class="form-group"> | |
| <button id="btn-heatmap-basic" class="btn btn--primary">Basic Heatmap</button> | |
| <button id="btn-corr-matrix" class="btn btn--primary">Correlation Matrix</button> | |
| <button id="btn-clustermap" class="btn btn--primary">Clustermap</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-heatmaps" width="800" height="500"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Best Practices:</strong><br> | |
| β’ Always annotate with values: <code>annot=True</code><br> | |
| β’ Use diverging colormap for correlation: <code>cmap='coolwarm', center=0</code><br> | |
| β’ Mask upper/lower triangle: <code>mask=np.triu(np.ones_like(corr))</code><br> | |
| β’ Square cells: <code>square=True</code> | |
| </div> | |
| <div class="callout callout--insight">π‘ Clustermap automatically clusters similar rows/columns together.</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Correlation Coefficients</h3> | |
| <p>Correlation heatmaps display the strength of linear relationships between variables, typically mapping the | |
| <strong>Pearson Correlation Coefficient ($r$)</strong> to a discrete color gradient hexbin: | |
| </p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">For non-linear but monotonic relationships, you should switch pandas to use | |
| <strong>Spearman's Rank Correlation ($\rho$)</strong>, which mathematically converts raw values to ranks | |
| $R(x_i)$ before applying the same formula. Both map perfectly bounds of $[-1, 1]$. | |
| </p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>heatmaps.py - Correlation Matrix</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import seaborn as sns | |
| import matplotlib.pyplot as plt | |
| import numpy as np | |
| # Load data and calculate correlation matrix | |
| penguins = sns.load_dataset("penguins") | |
| # Select only numerical columns for correlation | |
| numerical_df = penguins.select_dtypes(include=[np.number]) | |
| corr = numerical_df.corr() | |
| # Create a mask for the upper triangle | |
| mask = np.triu(np.ones_like(corr, dtype=bool)) | |
| plt.figure(figsize=(8, 6)) | |
| # Draw the heatmap with the mask and correct aspect ratio | |
| sns.heatmap( | |
| corr, | |
| mask=mask, | |
| cmap='coolwarm', | |
| vmax=1, vmin=-1, | |
| center=0, | |
| square=True, | |
| linewidths=.5, | |
| annot=True, | |
| fmt=".2f", | |
| cbar_kws={"shrink": .8} | |
| ) | |
| plt.title("Penguin Feature Correlation") | |
| plt.tight_layout() | |
| plt.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 14. PLOTLY ================== --> | |
| <section id="plotly" class="topic-section"> | |
| <h2>π Plotly Express: Interactive Visualization</h2> | |
| <p>Plotly creates interactive, web-based visualizations with zoom, pan, hover tooltips, and more.</p> | |
| <div class="info-card"> | |
| <strong>Why Plotly?</strong> | |
| <ul> | |
| <li>Interactive out of the box (zoom, pan, select)</li> | |
| <li>Hover tooltips with data details</li> | |
| <li>Export as HTML, PNG, or embed in dashboards</li> | |
| <li>Works in Jupyter, Streamlit, Dash</li> | |
| <li>plotly.express is the high-level API (like Seaborn for Matplotlib)</li> | |
| </ul> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-plotly" width="800" height="400"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Common Functions:</strong><br> | |
| <code>px.scatter(df, x='x', y='y', color='category', size='value', hover_data=['name'])</code><br> | |
| <code>px.line(df, x='date', y='price', color='stock')</code><br> | |
| <code>px.bar(df, x='category', y='count', color='group', barmode='group')</code><br> | |
| <code>px.histogram(df, x='value', nbins=50, marginal='box')</code> | |
| </div> | |
| </section> | |
| <!-- ====================== 15. ANIMATIONS ================== --> | |
| <section id="animations" class="topic-section"> | |
| <h2>π¬ Animated Visualizations</h2> | |
| <p>Add time dimension to your visualizations with animations.</p> | |
| <div class="form-group"> | |
| <button id="btn-animate" class="btn btn--primary">βΆ Play Animation</button> | |
| <button id="btn-stop" class="btn btn--primary">βΉ Stop</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-animation" width="800" height="400"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Plotly Animation:</strong><br> | |
| <code>px.scatter(df, x='gdp', y='life_exp', animation_frame='year', animation_group='country', size='pop', color='continent')</code><br><br> | |
| <strong>Matplotlib Animation:</strong><br> | |
| <code>from matplotlib.animation import FuncAnimation</code><br> | |
| <code>ani = FuncAnimation(fig, update_func, frames=100, interval=50)</code> | |
| </div> | |
| <div class="callout callout--tip">β Hans Rosling's Gapminder is the classic example of animated scatter plots! | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>animation_example.py - Gapminder Scatter</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import plotly.express as px | |
| df = px.data.gapminder() | |
| # Plotly makes animations incredibly easy with two arguments: | |
| # 'animation_frame' (the time dimension) and 'animation_group' (the entity) | |
| fig = px.scatter( | |
| df, | |
| x="gdpPercap", y="lifeExp", | |
| animation_frame="year", animation_group="country", | |
| size="pop", color="continent", | |
| hover_name="country", | |
| log_x=True, size_max=55, | |
| range_x=[100,100000], range_y=[25,90], | |
| title="Global Development 1952 - 2007" | |
| ) | |
| fig.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 16. DASHBOARDS ================== --> | |
| <section id="dashboards" class="topic-section"> | |
| <h2>π± Interactive Dashboards with Streamlit</h2> | |
| <p>Build interactive web apps for data exploration without web development experience.</p> | |
| <div class="info-card"> | |
| <strong>Streamlit Basics:</strong><br> | |
| <code>streamlit run app.py</code><br><br> | |
| <code>import streamlit as st</code><br> | |
| <code>st.title("My Dashboard")</code><br> | |
| <code>st.slider("Select value", 0, 100, 50)</code><br> | |
| <code>st.selectbox("Choose", ["A", "B", "C"])</code><br> | |
| <code>st.plotly_chart(fig)</code> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-dashboard" width="800" height="400"></canvas> | |
| </div> | |
| <div class="callout callout--insight">π‘ Streamlit auto-reruns when input changes - no callbacks needed!</div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>app.py - Minimal Streamlit Dashboard</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import streamlit as st | |
| import pandas as pd | |
| import plotly.express as px | |
| # 1. Page Configuration | |
| st.set_page_config(page_title="Sales Dashboard", layout="wide") | |
| st.title("Interactive Sales Dashboard π") | |
| # 2. Sidebar Filters | |
| st.sidebar.header("Filters") | |
| category = st.sidebar.selectbox("Select Category", ["Electronics", "Clothing", "Home"]) | |
| min_sales = st.sidebar.slider("Minimum Sales", 0, 1000, 200) | |
| # Mock Data Generation | |
| df = pd.DataFrame({ | |
| 'Date': pd.date_range(start='2023-01-01', periods=30), | |
| 'Sales': [x * 10 for x in range(30)], | |
| 'Category': [category] * 30 | |
| }) | |
| filtered_df = df[df['Sales'] >= min_sales] | |
| # 3. Layout with Columns | |
| col1, col2 = st.columns(2) | |
| # KPI Metric | |
| col1.metric("Total Filtered Sales", f"${filtered_df['Sales'].sum()}") | |
| # 4. Insert Plotly Chart | |
| fig = px.line(filtered_df, x='Date', y='Sales', title=f"{category} Sales Trend") | |
| col2.plotly_chart(fig, use_container_width=True)</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 17. GEOSPATIAL ================== --> | |
| <section id="geospatial" class="topic-section"> | |
| <h2>πΊοΈ Geospatial Visualization</h2> | |
| <p>Visualize geographic data with maps, choropleth, and point plots.</p> | |
| <div class="form-group"> | |
| <button id="btn-choropleth" class="btn btn--primary">Choropleth Map</button> | |
| <button id="btn-scatter-geo" class="btn btn--primary">Scatter on Map</button> | |
| <button id="btn-heatmap-geo" class="btn btn--primary">Density Map</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-geo" width="800" height="500"></canvas> | |
| </div> | |
| <div class="info-card"> | |
| <strong>Libraries:</strong><br> | |
| β’ <strong>Plotly:</strong> <code>px.choropleth(df, locations='country', color='value')</code><br> | |
| β’ <strong>Folium:</strong> Interactive Leaflet maps<br> | |
| β’ <strong>Geopandas + Matplotlib:</strong> Static maps with shapefiles<br> | |
| β’ <strong>Kepler.gl:</strong> Large-scale geospatial visualization | |
| </div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: Geospatial Math</h3> | |
| <p>Visualizing data on a map requires mathematically converting a 3D spherical Earth into 2D screen pixels. | |
| The <strong>Web Mercator Projection</strong> (used by Google Maps and Plotly) achieves this by preserving | |
| angles (conformal) but heavily distorting sizes near the poles:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; font-size: 1.1em; color: #e4e6eb;"> | |
| $$ x = R \cdot \lambda \qquad y = R \ln\left[\tan\left(\frac{\pi}{4} + \frac{\varphi}{2}\right)\right] $$ | |
| </div> | |
| <p style="margin-bottom: 0;">Furthermore, when calculating distances between two GPS coordinates (e.g., to | |
| color a density heatmap), you cannot use straight Euclidean distance $d = \sqrt{x^2+y^2}$. Advanced | |
| libraries compute the <strong>Haversine formula</strong> to find the true great-circle distance over the | |
| sphere.</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>geospatial.py - Plotly Choropleth</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import plotly.express as px | |
| # Plotly includes built-in geospatial data | |
| df = px.data.gapminder().query("year==2007") | |
| # Create a choropleth map | |
| # 'locations' takes ISO-3 country codes by default | |
| fig = px.choropleth( | |
| df, | |
| locations="iso_alpha", # Geopolitical boundaries | |
| color="lifeExp", # Data to map to color | |
| hover_name="country", # Tooltip label | |
| color_continuous_scale=px.colors.sequential.Plasma, | |
| title="Global Life Expectancy (2007)" | |
| ) | |
| # Customize the map projection type | |
| fig.update_geos( | |
| projection_type="orthographic", # "natural earth", "mercator", etc. | |
| showcoastlines=True, | |
| coastlinecolor="DarkBlue" | |
| ) | |
| fig.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 18. 3D PLOTS ================== --> | |
| <section id="3d-plots" class="topic-section"> | |
| <h2>π² 3D Visualization</h2> | |
| <p>Visualize three-dimensional relationships with surface plots, scatter plots, and more.</p> | |
| <div class="form-group"> | |
| <button id="btn-3d-scatter" class="btn btn--primary">3D Scatter</button> | |
| <button id="btn-3d-surface" class="btn btn--primary">Surface Plot</button> | |
| <button id="btn-3d-wireframe" class="btn btn--primary">Wireframe</button> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-3d" width="800" height="500"></canvas> | |
| </div> | |
| <div class="callout callout--mistake">β οΈ 3D plots can obscure data. Often, multiple 2D views are more effective. | |
| </div> | |
| <div class="callout callout--tip">β Use Plotly for interactive 3D (rotate, zoom) instead of static Matplotlib | |
| 3D.</div> | |
| <div class="info-card" style="margin-top: 20px; border-left-color: #9900ff;"> | |
| <h3 style="margin-top: 0; color: #9900ff;">π§ Under the Hood: 3D Perspective Projection Matrix</h3> | |
| <p>To render 3D data $(x, y, z)$ on a 2D screen browser, libraries like Plotly.js apply a <strong>Perspective | |
| Projection Matrix</strong>. This creates the optical illusion of depth by scaling $x$ and $y$ inversely | |
| with distance $z$:</p> | |
| <div | |
| style="background: rgba(0,0,0,0.2); padding: 15px; border-radius: 8px; text-align: center; margin: 15px 0; overflow-x: auto; color: #e4e6eb;"> | |
| $$ \begin{bmatrix} x' \\ y' \\ z' \\ w \end{bmatrix} = \begin{bmatrix} \frac{1}{\text{aspect} \cdot | |
| \tan(\frac{fov}{2})} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\frac{fov}{2})} & 0 & 0 \\ 0 & 0 & \frac{f+n}{f-n} & | |
| \frac{-2fn}{f-n} \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} $$ | |
| </div> | |
| <p style="margin-bottom: 0;">Once multiplied out, the final screen coordinates are $(x'/w, y'/w)$. When you | |
| rapidly drag to rotate a 3D Plotly graph, your browser's WebGL engine is recalculating this exact matrix | |
| millions of times per second to update the viewpoint mapping in real-time!</p> | |
| </div> | |
| <div class="code-block" style="margin-top: 20px;"> | |
| <div class="code-header"> | |
| <span>3d_plots.py - Interactive 3D Scatter</span> | |
| <button class="copy-btn" onclick="copyCode(this)">Copy</button> | |
| </div> | |
| <pre><code>import plotly.express as px | |
| # Load Iris dataset | |
| df = px.data.iris() | |
| # Create interactive 3D scatter plot | |
| fig = px.scatter_3d( | |
| df, | |
| x='sepal_length', | |
| y='sepal_width', | |
| z='petal_width', | |
| color='species', | |
| size='petal_length', | |
| size_max=18, | |
| symbol='species', | |
| opacity=0.7, | |
| title="Iris 3D Feature Space" | |
| ) | |
| # Tight layout for 3D plot | |
| fig.update_layout(margin=dict(l=0, r=0, b=0, t=40)) | |
| fig.show()</code></pre> | |
| </div> | |
| </section> | |
| <!-- ====================== 19. STORYTELLING ================== --> | |
| <section id="storytelling" class="topic-section"> | |
| <h2>π Data Storytelling</h2> | |
| <p>Transform visualizations into compelling narratives that drive action.</p> | |
| <div class="info-card"> | |
| <strong>The Data Storytelling Framework:</strong> | |
| <ol> | |
| <li><strong>Context:</strong> Why does this matter? Who is the audience?</li> | |
| <li><strong>Data:</strong> What insights did you discover?</li> | |
| <li><strong>Narrative:</strong> What's the storyline (beginning, middle, end)?</li> | |
| <li><strong>Visual:</strong> Which chart best supports the story?</li> | |
| <li><strong>Call to Action:</strong> What should the audience do?</li> | |
| </ol> | |
| </div> | |
| <div class="canvas-wrapper"> | |
| <canvas id="canvas-storytelling" width="800" height="400"></canvas> | |
| </div> | |
| <h3>Design Principles</h3> | |
| <div class="info-card"> | |
| <strong>Remove Clutter:</strong> Eliminate chartjunk, gridlines, borders<br> | |
| <strong>Focus Attention:</strong> Use color strategically (grey + accent)<br> | |
| <strong>Think Like a Designer:</strong> Alignment, white space, hierarchy<br> | |
| <strong>Tell a Story:</strong> Title = conclusion, not description<br> | |
| <strong>Bad:</strong> "Sales by Region"<br> | |
| <strong>Good:</strong> "West Region Sales Dropped 23% in Q4" | |
| </div> | |
| <div class="callout callout--insight">π‘ "If you can't explain it simply, you don't understand it well enough." | |
| β Einstein</div> | |
| <div class="callout callout--tip">β Read "Storytelling with Data" by Cole Nussbaumer Knaflic</div> | |
| </section> | |
| <!-- Footer --> | |
| <footer | |
| style="text-align: center; padding: 40px 20px; border-top: 1px solid var(--border-color); margin-top: 60px;"> | |
| <p style="color: var(--text-secondary);">π Data Visualization Masterclass | Part of the Data Science & AI | |
| Curriculum</p> | |
| <a href="../index.html" class="btn btn--primary" style="margin-top: 16px;">β Back to Home</a> | |
| </footer> | |
| </main> | |
| </div> | |
| <script src="app.js"></script> | |
| </body> | |
| </html> |