{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ML Practice Series: Module 22 - SQL & Databases for Data Science\n",
"\n",
"In the real world, data lives in databases, not just CSVs. This module teaches you how to bridge the gap between **SQL (Structured Query Language)** and **Python/Pandas**.\n",
"\n",
"### Objectives:\n",
"1. **Connecting to Databases**: Using `sqlite3` (built into Python).\n",
"2. **Basic Queries**: SELECT, WHERE, and JOIN in Python.\n",
"3. **SQL to Pandas**: Loading query results directly into a DataFrame.\n",
"4. **Database Design**: Understanding primary keys and foreign keys.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Setting up a Virtual Database\n",
"We will create an in-memory database and populate it with some sample Data Science job data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sqlite3\n",
"import pandas as pd\n",
"\n",
"# Create a connection to an in-memory database\n",
"conn = sqlite3.connect(':memory:')\n",
"cursor = conn.cursor()\n",
"\n",
"# Create a sample table\n",
"cursor.execute('''\n",
" CREATE TABLE jobs (\n",
" id INTEGER PRIMARY KEY,\n",
" title TEXT,\n",
" company TEXT,\n",
" salary INTEGER\n",
" )\n",
"''')\n",
"\n",
"# Insert sample records\n",
"jobs = [\n",
" (1, 'Data Scientist', 'Google', 150000),\n",
" (2, 'ML Engineer', 'Tesla', 160000),\n",
" (3, 'Data Analyst', 'Netflix', 120000),\n",
" (4, 'AI Research', 'OpenAI', 200000)\n",
"]\n",
"cursor.executemany('INSERT INTO jobs VALUES (?,?,?,?)', jobs)\n",
"conn.commit()\n",
"\n",
"print(\"Database created and table populated!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Basic SQL Queries in Python\n",
"\n",
"### Task 1: Fetching Data\n",
"Use standard SQL to fetch all jobs where the salary is greater than 140,000."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"query = \"SELECT * FROM jobs WHERE salary > 140000\"\n",
"cursor.execute(query)\n",
"results = cursor.fetchall()\n",
"for row in results:\n",
" print(row)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. SQL to Pandas: The Professional Way\n",
"\n",
"### Task 2: pd.read_sql_query\n",
"Professionals use `pd.read_sql_query()` to pull data directly into a DataFrame. Try it now."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"df_sql = pd.read_sql_query(\"SELECT * FROM jobs\", conn)\n",
"print(df_sql.head())\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Bridge Completed! \n",
"You now know how to pull data from any standard relational database.\n",
"Next: **Model Explainability (SHAP)**."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}