Database Queries
Last month, our production dashboard was showing 15,000 database queries per minute during peak hours. Our response times were creeping up, and the database CPU was constantly hovering around 75%. We knew we had a problem, but here’s the thing—our schema was actually fine. Didn’t need more indexes, we didn’t need to restructure our tables, and we definitely didn’t need to spin up a bigger database instance.
We needed to stop hitting the database so damn much.
Three weeks later, we were down to 3,000 queries per minute with the same traffic load. Response times dropped by 60%, and our database was practically yawning at 20% CPU. I want to share exactly how we did it, because chances are, you’re facing the same issues we were.
The Wake-Up Call
It started with a Slack message from our DevOps lead: “DB connection pool is maxing out again. Third time this week.”
I pulled up our APM tool and started tracing through a typical user session. What I found was embarrassing. A single page load was triggering 47 separate database queries. Not because we were doing anything particularly complex—just because we were lazy.
Every time we needed a user’s name, we queried the database. Every time we needed to check a permission, another query. Loading a list of 20 items? That’s 20 queries for the items, plus one query per item to get the author details, plus another round to check user permissions. You see where this is going.
Strategy 1: Implement Smart Caching (The Biggest Win)
This was our first and most impactful move. We implemented a two-tier caching strategy using Redis.
Application-Level Caching
We started by identifying our most frequently accessed, rarely changing data:
- User profiles and permissions
- Configuration settings
- Lookup tables (countries, categories, statuses)
- Relatively static content
For these, we implemented a cache-aside pattern with generous TTLs:
def get_user_profile(user_id):
cache_key = f"user_profile:{user_id}"
# Try cache first
cached_data = redis_client.get(cache_key)
if cached_data:
return json.loads(cached_data)
# Cache miss - hit the database
user_data = db.query("SELECT * FROM users WHERE id = %s", user_id)
# Store in cache for 1 hour
redis_client.setex(cache_key, 3600, json.dumps(user_data))
return user_data
Result: This alone cut our user-related queries by 70%. User profiles were being fetched dozens of times per request—now it’s once per hour per user.
Query Result Caching
For more complex queries, especially those used in dashboards and reports, we cached the entire result set:
def get_monthly_statistics(month, year):
cache_key = f"stats:{year}:{month}"
cached_result = redis_client.get(cache_key)
if cached_result:
return json.loads(cached_result)
# Expensive aggregation query
results = db.query("""
SELECT date, COUNT(*), SUM(revenue)
FROM transactions
WHERE YEAR(created_at) = %s AND MONTH(created_at) = %s
GROUP BY date
""", year, month)
# Cache for 5 minutes
redis_client.setex(cache_key, 300, json.dumps(results))
return results
The rule we followed: If a query takes more than 100ms or is called more than 10 times per minute, cache it.
Strategy 2: Batch Your Database Operations
This is where we found our second biggest win. We had code like this everywhere:
# Bad: N+1 query problem
posts = db.query("SELECT * FROM posts WHERE status = 'published' LIMIT 20")
for post in posts:
author = db.query("SELECT * FROM users WHERE id = %s", post.author_id)
post.author_name = author.name
That’s 21 queries for what should be one. We refactored to use eager loading:
# Good: 2 queries total
posts = db.query("SELECT * FROM posts WHERE status = 'published' LIMIT 20")
author_ids = [post.author_id for post in posts]
authors = db.query("SELECT * FROM users WHERE id IN %s", tuple(author_ids))
author_map = {author.id: author for author in authors}
for post in posts:
post.author_name = author_map[post.author_id].name
We created a simple batching utility that became our best friend:
class QueryBatcher:
def __init__(self, batch_size=100):
self.batch_size = batch_size
self.queries = []
def add(self, query, params):
self.queries.append((query, params))
if len(self.queries) >= self.batch_size:
self.flush()
def flush(self):
if not self.queries:
return
# Execute all queries in a single transaction
with db.transaction():
for query, params in self.queries:
db.execute(query, params)
self.queries = []
We used this for bulk inserts, updates, and deletes. What used to be 1,000 individual INSERT statements became a single batch operation.
Strategy 3: Leverage Application-Level Joins
Sometimes you don’t need the database to do all the work. We had several reports that were joining 5-6 tables, creating massive query execution times.
We broke these apart:
# Instead of one massive JOIN
# We do separate cached queries and join in application code
users = get_cached_users() # Cached for 1 hour
orders = get_user_orders(user_id) # Smaller, faster query
products = get_cached_products() # Cached for 24 hours
# Join in memory
report_data = []
for order in orders:
user = users[order.user_id]
product = products[order.product_id]
report_data.append({
'user_name': user.name,
'product_name': product.name,
'order_total': order.total
})
Yes, this uses more application memory. But memory is cheap, and it scales horizontally. Database connections don’t.
Strategy 4: Implement Request-Level Caching
This one’s subtle but powerful. We added a request-scoped cache that lives only for the duration of a single HTTP request:
from werkzeug.local import LocalProxy
from flask import g
def get_request_cache():
if 'request_cache' not in g:
g.request_cache = {}
return g.request_cache
request_cache = LocalProxy(get_request_cache)
def get_user(user_id):
cache_key = f"user:{user_id}"
# Check request cache first
if cache_key in request_cache:
return request_cache[cache_key]
# Then check Redis
# Then check database
# ...
# Store in request cache
request_cache[cache_key] = user_data
return user_data
This eliminated duplicate queries within the same request. If we call get_user(123) fifty times during a single page render, we hit the database once.
Strategy 5: Lazy Loading Where Appropriate
Not everything needs to be loaded upfront. We implemented lazy loading for expensive relationships:
class Post:
def __init__(self, data):
self.id = data['id']
self.title = data['title']
self._comments = None # Not loaded yet
@property
def comments(self):
if self._comments is None:
self._comments = get_comments_for_post(self.id)
return self._comments
Now comments are only loaded if the code actually accesses them. For list views where we don’t show comments, we save those queries entirely.
Strategy 6: Denormalize Strategically (Without Changing Schema)
Wait, didn’t I say we didn’t touch the schema? We didn’t add columns or change relationships. But we did add a small denormalization layer using Redis.
We stored computed values that were expensive to calculate:
# Instead of counting on every page load
def get_user_post_count(user_id):
cache_key = f"user_post_count:{user_id}"
count = redis_client.get(cache_key)
if count:
return int(count)
count = db.query("SELECT COUNT(*) FROM posts WHERE author_id = %s", user_id)
redis_client.setex(cache_key, 86400, count) # 24 hours
return count
# Invalidate on new post
def create_post(author_id, title, content):
post_id = db.insert("INSERT INTO posts ...", author_id, title, content)
# Invalidate the count cache
redis_client.delete(f"user_post_count:{author_id}")
return post_id
The Monitoring That Saved Us
None of this would have worked without proper monitoring. We set up:
- Query frequency tracking: Which queries run most often?
- Slow query logging: Which queries take the longest?
- Cache hit rates: Are our caches actually working?
- N+1 detection: APM tools that flag potential N+1 problems
We used a combination of New Relic, our database’s slow query log, and custom metrics in Prometheus.
The Results
After three weeks of implementation:
- Database queries: 15,000/min → 3,000/min (80% reduction)
- Average response time: 850ms → 340ms (60% improvement)
- Database CPU: 75% → 20% (73% reduction)
- Peak connection pool usage: 95% → 45%
- 95th percentile response time: 2.1s → 680ms
But here’s what really mattered: we stopped getting paged at 3 AM. Our infrastructure costs dropped by $800/month because we could downsize our database instance. Our product team stopped complaining about the admin dashboard being slow.
Lessons Learned
1. Profile before optimizing: We wasted a week optimizing queries that ran 10 times per day. Focus on what actually matters.
2. Cache hit rates matter more than cache size: A 1 GB cache with a 95% hit rate beats a 10 GB cache with a 60% hit rate.
3. Invalidation is harder than caching: We spent more time on cache invalidation logic than on the caching itself. Plan for this.
4. Not everything should be cached: Real-time data, user-specific data that changes frequently, and small lookup queries sometimes aren’t worth caching.
5. Monitor, measure, iterate: We didn’t get it right the first time. Our initial caching strategy actually made some things slower.
Common Pitfalls to Avoid
Over-caching: We initially cached everything and ran into memory issues. Be selective.
Stale data: We forgot to invalidate caches on updates and showed users outdated information for hours. Always plan your invalidation strategy.
Cache stampede: When a popular cache entry expires, hundreds of requests hit the database simultaneously. Use cache warming or stale-while-revalidate patterns.
Ignoring cache misses: A cache with a 30% hit rate is worse than no cache at all—you’ve added latency without benefits.
Your Turn
You don’t need to redesign your entire database to see massive performance improvements. Start small:
- Add caching to your most frequently called queries
- Fix your N+1 problems (they’re everywhere, trust me)
- Implement request-level caching to eliminate duplicate queries
- Monitor and measure everything
The schema changes can wait. Your users will thank you for the faster app today, not for the perfectly normalized database next quarter.


