CSDT Community

Discover collections and communities that match your interests.

Posted By USER

02/27/2025

Here's a Python cheat sheet for data analysis, covering essential libraries, functions, and techniques

1. General Python Basics

Theory:

Python Language is an interpreted, high-level, general-purposeprogramming language that supports object-oriented, procedural, and functionalprogramming paradigms.

Code:

# Data Types
x = 10            # int
y = 10.5          # float
z = "Hello"       # str

# Lists
my_list = [1, 2, 3, 4]
my_list.append(5)

# Dictionaries
my_dict = {"name": "Alice", "age": 25}
my_dict["city"] = "New York"

# Loops
for i in range(5):
    print(i)

# Functions
def square(x):
    return x ** 2

# Lambda Functions
add = lambda a, b: a b

2. Numpy

Theory:

NumPy is a library for numerical computations in Python. Itprovides support for arrays, matrices, and many mathematical operations.

Code:

import numpy as np

# Arrays
arr = np.array([1, 2, 3, 4])
zeros = np.zeros((2, 3))
ones = np.ones((2, 3))

# Operations
arr_sum = np.sum(arr)
arr_mean = np.mean(arr)
arr_std = np.std(arr)

# Indexing
slice_arr = arr[1:3]

3. Pandas

Theory:

Pandas is a library for data manipulation and analysis. Itprovides data structures such as Series and DataFrame to handle structured dataefficiently.

import pandas as pd

# Create DataFrame
data = {"Name": ["Alice", "Bob"], "Age":[25, 30]}
df = pd.DataFrame(data)

# Read/Write Data
csv_data = pd.read_csv("data.csv")
df.to_csv("output.csv", index=False)

# Analyze Data
df.info()
df.describe()

# Filter Rows
filtered_df = df[df['Age'] > 25]

# Grouping
grouped = df.groupby("Name").mean()

4. Matplotlib & Seaborn

Theory:

Matplotlib and Seaborn are libraries for data visualization.Matplotlib provides low-level plotting tools, while Seaborn offers high-levelstatistical graphics.

Code:

import matplotlib.pyplot as plt
import seaborn as sns

# Line Plot
plt.plot([1, 2, 3], [4, 5, 6])
plt.title("Line Plot")
plt.show()

# Seaborn Heatmap
data = np.random.rand(4, 4)
sns.heatmap(data, annot=True)
plt.show()

5. Scikit-learn

Theory:

Scikit-learn is a library for machine learning. It providestools for data preprocessing, model selection, and various algorithms likelinear regression, classification, and clustering.

Code:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=42)

# Train Model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)

6. Data Cleaning & Preprocessing

Theory:

Data cleaning involves handling missing data, removingduplicates, and ensuring data quality. Preprocessing prepares data for machinelearning by scaling, encoding, and transforming it.

Code:

# Handling Missing Data
df.fillna(0, inplace=True)
df.dropna(inplace=True)

# Encoding
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
df['Category'] = encoder.fit_transform(df['Category'])

# Scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

7. Working with APIs

Theory:

APIs allow programs to interact with web services.Python’s requests library simplifies HTTP methods like GET and POST.

Code:

import requests

# GET Request
response = requests.get("https://api.example.com/data")
if response.status_code == 200:
data = response.json()

# POST Request
payload = {"key": "value"}
response = requests.post("https://api.example.com/data",json=payload)

8. SQL with Python (SQLite)

Theory:

SQLite is a lightweight database engine. Python’s sqlite3 libraryallows you to perform SQL operations programmatically.

Code:

import sqlite3

# Connect to DB
conn = sqlite3.connect("database.db")
cursor = conn.cursor()

# Execute Queries
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER, nameTEXT)")
cursor.execute("INSERT INTO users VALUES (1, 'Alice')")

# Fetch Data
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()

conn.commit()
conn.close()

9. Regular Expressions

Theory:

Regular expressions (regex) are patterns used for matchingand manipulating strings. Python’s re library provides regex support.

Code:

import re

# Match Pattern
pattern = r"\d "
result = re.findall(pattern, "123 Main Street")

# Replace Pattern
new_text = re.sub(r"\d ", "#", "123 Main Street")

10. File Handling

Theory:

File handling allows you to read, write, and manipulatefiles. Python’s built-in functions like open() make file operationssimple.

Code:

# Read File
with open("data.txt", "r") as file:
content = file.read()

# Write File
with open("output.txt", "w") as file:
file.write("Hello, World!")

This cheat sheet serves as a quick reference for commonPython Language tasks and libraries used by data analysts and developers.

CSDT Community

Python Programming Language Cheat Sheet for Data Analysis

Posted By USER

02/27/2025

Our Recent Comments

lxbfYeaa

Leave Your Coment

CSDT Centre

Useful Links

Popular Links

Stay Up to Date With Whats Happening