{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Practice Series: Module 22 - SQL & Databases for Data Science\n", "\n", "In the real world, data lives in databases, not just CSVs. This module teaches you how to bridge the gap between **SQL (Structured Query Language)** and **Python/Pandas**.\n", "\n", "### Objectives:\n", "1. **Connecting to Databases**: Using `sqlite3` (built into Python).\n", "2. **Basic Queries**: SELECT, WHERE, and JOIN in Python.\n", "3. **SQL to Pandas**: Loading query results directly into a DataFrame.\n", "4. **Database Design**: Understanding primary keys and foreign keys.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setting up a Virtual Database\n", "We will create an in-memory database and populate it with some sample Data Science job data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sqlite3\n", "import pandas as pd\n", "\n", "# Create a connection to an in-memory database\n", "conn = sqlite3.connect(':memory:')\n", "cursor = conn.cursor()\n", "\n", "# Create a sample table\n", "cursor.execute('''\n", " CREATE TABLE jobs (\n", " id INTEGER PRIMARY KEY,\n", " title TEXT,\n", " company TEXT,\n", " salary INTEGER\n", " )\n", "''')\n", "\n", "# Insert sample records\n", "jobs = [\n", " (1, 'Data Scientist', 'Google', 150000),\n", " (2, 'ML Engineer', 'Tesla', 160000),\n", " (3, 'Data Analyst', 'Netflix', 120000),\n", " (4, 'AI Research', 'OpenAI', 200000)\n", "]\n", "cursor.executemany('INSERT INTO jobs VALUES (?,?,?,?)', jobs)\n", "conn.commit()\n", "\n", "print(\"Database created and table populated!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Basic SQL Queries in Python\n", "\n", "### Task 1: Fetching Data\n", "Use standard SQL to fetch all jobs where the salary is greater than 140,000." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "query = \"SELECT * FROM jobs WHERE salary > 140000\"\n", "cursor.execute(query)\n", "results = cursor.fetchall()\n", "for row in results:\n", " print(row)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. SQL to Pandas: The Professional Way\n", "\n", "### Task 2: pd.read_sql_query\n", "Professionals use `pd.read_sql_query()` to pull data directly into a DataFrame. Try it now." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "df_sql = pd.read_sql_query(\"SELECT * FROM jobs\", conn)\n", "print(df_sql.head())\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Bridge Completed! \n", "You now know how to pull data from any standard relational database.\n", "Next: **Model Explainability (SHAP)**." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }