Here's a Python cheat sheet for data analysis, covering essential libraries, functions, and techniques
1. General Python Basics
Theory:
Python Language is an interpreted, high-level, general-purposeprogramming language that supports object-oriented, procedural, and functionalprogramming paradigms.
Code:
# Data Types
x = 10 # int
y = 10.5 # float
z = "Hello" # str
# Lists
my_list = [1, 2, 3, 4]
my_list.append(5)
# Dictionaries
my_dict = {"name": "Alice", "age": 25}
my_dict["city"] = "New York"
# Loops
for i in range(5):
print(i)
# Functions
def square(x):
return x ** 2
# Lambda Functions
add = lambda a, b: a b
2. Numpy
Theory:
NumPy is a library for numerical computations in Python. Itprovides support for arrays, matrices, and many mathematical operations.
Code:
import numpy as np
# Arrays
arr = np.array([1, 2, 3, 4])
zeros = np.zeros((2, 3))
ones = np.ones((2, 3))
# Operations
arr_sum = np.sum(arr)
arr_mean = np.mean(arr)
arr_std = np.std(arr)
# Indexing
slice_arr = arr[1:3]
3. Pandas
Theory:
Pandas is a library for data manipulation and analysis. Itprovides data structures such as Series and DataFrame to handle structured dataefficiently.
import pandas as pd
# Create DataFrame
data = {"Name": ["Alice", "Bob"], "Age":[25, 30]}
df = pd.DataFrame(data)
# Read/Write Data
csv_data = pd.read_csv("data.csv")
df.to_csv("output.csv", index=False)
# Analyze Data
df.info()
df.describe()
# Filter Rows
filtered_df = df[df['Age'] > 25]
# Grouping
grouped = df.groupby("Name").mean()
4. Matplotlib & Seaborn
Theory:
Matplotlib and Seaborn are libraries for data visualization.Matplotlib provides low-level plotting tools, while Seaborn offers high-levelstatistical graphics.
Code:
import matplotlib.pyplot as plt
import seaborn as sns
# Line Plot
plt.plot([1, 2, 3], [4, 5, 6])
plt.title("Line Plot")
plt.show()
# Seaborn Heatmap
data = np.random.rand(4, 4)
sns.heatmap(data, annot=True)
plt.show()
5. Scikit-learn
Theory:
Scikit-learn is a library for machine learning. It providestools for data preprocessing, model selection, and various algorithms likelinear regression, classification, and clustering.
Code:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)
# Train Model
model = LinearRegression()
model.fit(X_train, y_train)
# Evaluate
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
6. Data Cleaning & Preprocessing
Theory:
Data cleaning involves handling missing data, removingduplicates, and ensuring data quality. Preprocessing prepares data for machinelearning by scaling, encoding, and transforming it.
Code:
# Handling Missing Data
df.fillna(0, inplace=True)
df.dropna(inplace=True)
# Encoding
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df['Category'] = encoder.fit_transform(df['Category'])
# Scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
7. Working with APIs
Theory:
APIs allow programs to interact with web services.Python’s requests library simplifies HTTP methods like GET and POST.
Code:
import requests
# GET Request
response = requests.get("https://api.example.com/data")
if response.status_code == 200:
data = response.json()
# POST Request
payload = {"key": "value"}
response = requests.post("https://api.example.com/data",json=payload)
8. SQL with Python (SQLite)
Theory:
SQLite is a lightweight database engine. Python’s sqlite3 libraryallows you to perform SQL operations programmatically.
Code:
import sqlite3
# Connect to DB
conn = sqlite3.connect("database.db")
cursor = conn.cursor()
# Execute Queries
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER, nameTEXT)")
cursor.execute("INSERT INTO users VALUES (1, 'Alice')")
# Fetch Data
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
conn.commit()
conn.close()
9. Regular Expressions
Theory:
Regular expressions (regex) are patterns used for matchingand manipulating strings. Python’s re library provides regex support.
Code:
import re
# Match Pattern
pattern = r"\d "
result = re.findall(pattern, "123 Main Street")
# Replace Pattern
new_text = re.sub(r"\d ", "#", "123 Main Street")
10. File Handling
Theory:
File handling allows you to read, write, and manipulatefiles. Python’s built-in functions like open() make file operationssimple.
Code:
# Read File
with open("data.txt", "r") as file:
content = file.read()
# Write File
with open("output.txt", "w") as file:
file.write("Hello, World!")
This cheat sheet serves as a quick reference for commonPython Language tasks and libraries used by data analysts and developers.