The Ultimate AI & LangChain Cheatsheet
The Ultimate AI & LangChain Cheatsheet
Artificial Intelligence is the fastest-growing niche in software development. This cheatsheet covers everything from calling the OpenAI API to building production-ready AI agent workflows with LangChain.
LangChain Basics
LangChain is a framework for building applications powered by Large Language Models (LLMs). It provides composable abstractions for chains, agents, memory, and tools.
Installation
pip install langchain langchain-openai langchain-community
pip install python-dotenv # for environment variablesCore Concepts
| Concept | Description | |-------------|-----------------------------------------------------| | LLM | The underlying language model (GPT-4, Claude, etc.) | | Prompt | The structured input sent to the LLM | | Chain | A sequence of calls (prompt → LLM → output) | | Agent | An LLM that decides which tools to call | | Tool | A function an agent can invoke (search, calculator) | | Memory | Persisting conversation history across turns | | Retriever | Fetches relevant documents for RAG | | Vector Store| Database for storing and searching embeddings |
Hello World with LangChain
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI(model="gpt-4o", temperature=0)
response = llm.invoke([HumanMessage(content="What is LangChain?")])
print(response.content)Using Prompt Templates
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that summarizes articles."),
("human", "Summarize the following text:\n\n{text}"),
])
chain = prompt | llm
result = chain.invoke({"text": "LangChain is an open-source framework..."})
print(result.content)OpenAI API Guide
The OpenAI API gives you direct access to GPT-4o, DALL-E, Whisper, and more.
Setup
pip install openaiimport os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])Chat Completion (GPT-4o)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Explain async/await in Python."},
],
temperature=0.7,
max_tokens=500,
)
print(response.choices[0].message.content)Streaming Response
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem about code."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)Key Parameters
| Parameter | Description | Range |
|---------------|--------------------------------------------------|-------------|
| temperature | Creativity/randomness of output | 0.0 – 2.0 |
| max_tokens | Maximum tokens to generate | 1 – 128,000 |
| top_p | Nucleus sampling (alternative to temperature) | 0.0 – 1.0 |
| n | Number of completions to generate | 1+ |
| stop | Stop sequence(s) to end generation | string/list |
Available Models
gpt-4o → Most capable, multimodal (text + images)
gpt-4o-mini → Fast, cheap, great for simple tasks
gpt-4-turbo → Powerful, large context window
gpt-3.5-turbo → Budget option for simple tasks
text-embedding-3-small → Embeddings (cheap, fast)
text-embedding-3-large → Embeddings (high quality)
whisper-1 → Audio transcription
dall-e-3 → Image generation
Image Generation (DALL-E 3)
response = client.images.generate(
model="dall-e-3",
prompt="A futuristic city skyline at sunset, digital art",
size="1024x1024",
quality="standard",
n=1,
)
print(response.data[0].url)Prompt Engineering
Prompt engineering is the art of crafting inputs to get the best possible outputs from an LLM.
Core Techniques
1. Zero-Shot Prompting
Ask directly without examples:
Classify the sentiment of this review: "The product is amazing!"
Answer: Positive / Negative / Neutral
2. Few-Shot Prompting
Provide examples to guide the model:
Classify the sentiment:
Review: "This is terrible." → Negative
Review: "Absolutely love it!" → Positive
Review: "It's okay, nothing special." → Neutral
Review: "Best purchase I've ever made!" →
3. Chain-of-Thought (CoT)
Ask the model to reason step by step:
Q: If a store has 120 apples and sells 45, then gets a delivery of 60 more, how many are left?
Let's think step by step:
1. Start with 120 apples
2. Sell 45: 120 - 45 = 75
3. Add delivery: 75 + 60 = 135
Answer: 135
4. Role Prompting
Assign a persona for better context:
You are a senior Python engineer with 15 years of experience.
Review the following code and suggest improvements focusing on
performance and readability...
5. Structured Output Prompting
Force JSON or structured responses:
Extract the following fields from the text and return as JSON:
- name
- email
- company
Text: "Hi, I'm Alice from TechCorp. Reach me at alice@tech.com"
Return only valid JSON, no other text.
Prompt Best Practices
✅ Be specific and detailed
✅ Provide context and role ("You are a...")
✅ Use delimiters (```, ###, <tags>) to separate input from instructions
✅ Specify the output format explicitly
✅ Break complex tasks into steps
✅ Use examples (few-shot) for consistent formatting
❌ Don't be vague or ambiguous
❌ Don't rely on the model knowing the "right" format
❌ Don't ask multiple unrelated questions in one prompt
RAG Architecture
RAG (Retrieval-Augmented Generation) combines a retrieval system with an LLM to answer questions based on your own data.
Why RAG?
❌ LLMs have a knowledge cutoff date
❌ LLMs hallucinate facts
❌ LLMs don't know your private documents
✅ RAG solves all three by fetching real, relevant context at query time
RAG Pipeline
┌─────────────┐
│ Your Docs │
└──────┬──────┘
│ 1. Chunk & Embed
▼
┌─────────────┐
│ Vector DB │ ← Stored embeddings
└──────┬──────┘
│ 2. Semantic Search (query)
▼
┌─────────────┐
│ Retrieved │ ← Top-K relevant chunks
│ Context │
└──────┬──────┘
│ 3. Inject into prompt
▼
┌─────────────┐ ┌───────────┐
│ Prompt │──────▶│ LLM │──▶ Answer
│ + Context │ │ (GPT-4o) │
└─────────────┘ └───────────┘
Implementing RAG with LangChain
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
# 1. Create documents
docs = [
Document(page_content="LangChain is a framework for building LLM apps."),
Document(page_content="RAG combines retrieval with generation for accuracy."),
Document(page_content="Vector databases store numerical embeddings of text."),
]
# 2. Create vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(docs, embeddings)
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
# 3. Create RAG chain
llm = ChatOpenAI(model="gpt-4o")
system_prompt = """Use the provided context to answer the question.
If you don't know, say so. Be concise.
Context: {context}"""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}"),
])
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
# 4. Query
result = rag_chain.invoke({"input": "What is RAG?"})
print(result["answer"])AI Agent Workflow
Agents use LLMs as a reasoning engine to decide which tools to call and in what order to solve a task.
ReAct Agent Pattern
Loop:
Thought → LLM reasons about what to do next
Action → LLM picks a tool and input
Observation → Tool returns result
... repeat until ...
Final Answer → LLM gives the user a response
Building an Agent with LangChain
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
# 1. Define tools
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# In production, call a real weather API
return f"The weather in {city} is 22°C and sunny."
@tool
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
return str(eval(expression))
except Exception as e:
return f"Error: {e}"
tools = [get_weather, calculate]
# 2. Create the agent
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant with access to tools."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# 3. Run the agent
result = agent_executor.invoke({"input": "What's the weather in Kathmandu?"})
print(result["output"])Embeddings Explained
Embeddings convert text into dense numerical vectors that capture semantic meaning.
Why Embeddings?
"cat" → [0.21, -0.54, 0.87, ...] (1536 dimensions)
"kitten" → [0.19, -0.51, 0.84, ...] (very similar!)
"car" → [-0.63, 0.11, -0.22, ...] (very different)
Similar meaning → Similar vectors → Close in vector space
Creating Embeddings with OpenAI
from openai import OpenAI
client = OpenAI()
def embed(text: str) -> list[float]:
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
# Embed a single text
vector = embed("What is machine learning?")
print(f"Dimensions: {len(vector)}") # 1536
# Batch embed multiple texts
texts = ["Hello world", "Machine learning is fun", "Python is great"]
response = client.embeddings.create(input=texts, model="text-embedding-3-small")
vectors = [item.embedding for item in response.data]Similarity Search (Cosine Similarity)
import numpy as np
def cosine_similarity(a: list, b: list) -> float:
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
query = embed("What is AI?")
doc1 = embed("Artificial intelligence is the simulation of human intelligence.")
doc2 = embed("The stock market closed higher today.")
print(cosine_similarity(query, doc1)) # ~0.85 (high similarity)
print(cosine_similarity(query, doc2)) # ~0.12 (low similarity)Vector Databases
Vector databases are specialized databases for storing, indexing, and searching high-dimensional embedding vectors at scale.
Popular Options
| Database | Type | Best For | Hosted? | |-------------|--------------|--------------------------------|---------| | FAISS | In-memory | Local dev, prototyping | ❌ Self | | Chroma | Local/Server | Small-medium RAG apps | ❌ Self | | Pinecone| Managed | Production, large scale | ✅ Cloud | | Weaviate| Hybrid | Full-text + vector search | Both | | Qdrant | Managed/Self | High performance, filtering | Both | | pgvector| PostgreSQL | Already using Postgres | Both |
FAISS (Local — great for prototyping)
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create index
db = FAISS.from_texts(
["LangChain is a framework", "FAISS is fast", "RAG improves accuracy"],
embeddings
)
# Similarity search
results = db.similarity_search("What is FAISS?", k=2)
for doc in results:
print(doc.page_content)
# Save & load
db.save_local("faiss_index")
db = FAISS.load_local("faiss_index", embeddings, allow_dangerous_deserialization=True)Chroma (Local — persistent)
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
# Create persistent collection
db = Chroma(
collection_name="my_docs",
embedding_function=embeddings,
persist_directory="./chroma_db"
)
db.add_texts(["Document 1 content", "Document 2 content"])
# Query
results = db.similarity_search("search query", k=3)Pinecone (Production)
from pinecone import Pinecone
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")
index = pc.Index("my-index")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = PineconeVectorStore(index=index, embedding=embeddings)
# Add documents
vector_store.add_texts(["Your document text here"])
# Search
results = vector_store.similarity_search("query", k=5)Quick Reference
LangChain LCEL (LangChain Expression Language)
# Pipe operator chains components together
chain = prompt | llm | output_parser
# With retry and fallbacks
chain = (prompt | llm.with_retry(stop_after_attempt=3)).with_fallbacks([backup_llm])
# Parallel execution
from langchain_core.runnables import RunnableParallel
chain = RunnableParallel(
summary=summary_chain,
keywords=keyword_chain,
)Environment Setup
# .env file
OPENAI_API_KEY=sk-...
LANGCHAIN_API_KEY=ls__... # for LangSmith tracing
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=my-projectfrom dotenv import load_dotenv
load_dotenv() # loads .env automaticallyUseful AI Terminology
LLM → Large Language Model (GPT-4, Claude, Gemini)
Token → ~4 characters of text; billing unit for APIs
Context Window → Max tokens an LLM can process at once
Temperature → Controls output randomness (0=deterministic, 2=creative)
Hallucination→ LLM confidently stating false information
Embedding → Dense vector representation of text
RAG → Retrieval-Augmented Generation
Fine-tuning → Training an LLM further on your specific data
Prompt → The input given to an LLM
System Prompt→ Instructions that define the LLM's behavior/persona
Few-shot → Providing examples in the prompt
Zero-shot → No examples in the prompt
Chain-of-Thought → Asking the LLM to reason step-by-step
Agent → LLM + tools that takes autonomous actions
Read Next
The Ultimate Deployment Cheatsheet
A comprehensive guide to deploying web apps. Master Vercel, VPS setup, Nginx, environment variables, domain configuration, and SSL certificates.
The Ultimate Docker Cheatsheet
A comprehensive guide to Docker for developers. Master containers, images, networking, and Docker Compose.