Building a Commenting System [Part 2]: Designing the Database for Nested Comments
Nested comments sound simple. Store a parentId, query by it, done. Then you try to paginate a threaded discussion and realize it's not that straightforward.
The Challenge
I wanted comments that support:
- Nested replies (Reddit-style threading)
- Configurable depth limits (no infinitely nested threads)
- Pagination for posts with many comments
- Soft deletes that preserve thread context
- Reactions with one-per-user limits
The tricky part isn't storing the data. It's querying it efficiently while keeping the schema simple.
Choosing a Tree Structure
There are three common ways to model hierarchical data in SQL (I only found out about these 2 weeks ago when I started this blog):
Adjacency List (what I chose): Each row stores a parentId pointing to its parent. Simple to write, but you need multiple queries or recursion to get the full tree.
Nested Sets: Each row stores left and right values that define its position in a preorder traversal. Fast reads, but writes are expensive because inserting a node requires updating many rows.
Materialized Path: Each row stores its full path as a string like /1/5/12/. Easy to query descendants, but paths can get long and updates require string manipulation.
I went with adjacency list because:
- Writes are simple (just insert with a parentId)
- I'm not building Reddit-scale nesting
- I can control depth on the server to avoid deep recursion
- It's the most intuitive to work with
- Last and most honestly, this is the only pattern I've known and used in the past 😅
The downside is that I can't get the entire tree in a single query. But since I'm limiting nesting depth anyway, this hasn't been a problem.
The Schema
Here's the core of what I ended up with, using Drizzle ORM:
// server/database/schema.ts
export const comments = sqliteTable(
"comments",
{
id: text("id")
.primaryKey()
.$defaultFn(() => randomId("comments")),
postSlug: text("post_slug").notNull(),
userId: text("user_id")
.notNull()
.references(() => users.id),
parentId: text("parent_id"),
content: text("content").notNull(),
isEdited: int("is_edited").default(0).notNull(),
isDeleted: int("is_deleted").default(0).notNull(),
isHidden: int("is_hidden").default(0).notNull(),
...timestamps,
},
(table) => [
index("idx_comments_post_slug").on(table.postSlug),
index("idx_comments_parent_id").on(table.parentId),
index("idx_comments_user_id").on(table.userId),
],
);
A few things worth noting:
postSlug instead of postId: I don't have a posts table. Blog posts are markdown files (via Nuxt Content), so I use the slug as the identifier. The index on postSlug makes filtering by post fast.
parentId is nullable: Top-level comments have null for parentId. Replies point to their parent comment.
Three boolean flags: isEdited, isDeleted, and isHidden track different states. More on these later.
Three indexes: One for filtering by post, one for finding replies to a comment, one for user lookups (useful for profile pages or when blocking users).
The Self-Referential Relationship
Drizzle needs to know about the parent-child relationship for its query builder:
export const commentsRelations = relations(comments, ({ one, many }) => ({
user: one(users, {
fields: [comments.userId],
references: [users.id],
}),
parent: one(comments, {
fields: [comments.parentId],
references: [comments.id],
relationName: "commentReplies",
}),
replies: many(comments, {
relationName: "commentReplies",
}),
}));
The relationName ties the two sides together. Now I can query a comment and include its replies, or query a comment and include its parent.
Reactions: One Per User
Reactions are stored in a separate table with a unique constraint:
export const reactions = sqliteTable(
"reactions",
{
id: text("id")
.primaryKey()
.$defaultFn(() => randomId("reactions")),
commentId: text("comment_id")
.notNull()
.references(() => comments.id, { onDelete: "cascade" }),
userId: text("user_id")
.notNull()
.references(() => users.id),
type: text("type").default("thumbs_up").notNull(),
createdAt,
},
(table) => [
index("idx_reactions_comment_id").on(table.commentId),
uniqueIndex("idx_reactions_unique").on(table.commentId, table.userId, table.type),
],
);
The unique index on (commentId, userId, type) prevents duplicate reactions. If a user has already reacted to a comment, reacting again will either update or remove their existing reaction.
I included type in the constraint because I might add more reaction types later (sad, angry, etc.). For now it's always "thumbs_up", but the schema is ready to expand.
The onDelete: "cascade" means when a comment is hard-deleted, its reactions are automatically cleaned up.
Soft Deletes vs. Hard Deletes
This was a design decision I went back and forth on. When a user deletes their comment, should it:
- Hard delete: Remove the row entirely
- Soft delete: Keep the row but mark it as deleted
I went with soft delete for user-initiated deletions. Here's why:
If someone replies to a comment and then the original gets deleted, hard deletion would orphan those replies. The thread structure breaks down.
Instead, soft deletion replaces the content with a placeholder:
async softDelete(id: string): Promise<Comment | undefined> {
const [deleted] = await this.db
.update(comments)
.set({
isDeleted: 1,
content: "[Comment deleted]",
updatedAt: new Date(),
})
.where(eq(comments.id, id))
.returning();
return deleted;
}
The comment still exists in the tree, replies still have a valid parent, and users can see that something was there. This matches how Reddit and most forums handle it.
For admin actions (removing spam, for example), I do have a hard delete:
async hardDelete(id: string): Promise<void> {
await this.db.delete(comments).where(eq(comments.id, id));
}
This completely removes the comment and cascades to delete its reactions. I use this occassionaly, mainly for obvious spam that adds no value to the thread.
The Depth Problem
I wanted to limit how deep nesting can go. Infinite nesting leads to unreadable threads and UI headaches on mobile.
My first instinct was to store depth as a column. But that creates problems:
- You have to calculate depth on insert
- If you ever move a comment (unlikely, but possible), you'd need to update all descendants
- It's denormalized data that can get out of sync
Instead, I calculate depth on the fly by walking up the parent chain:
async getDepth(commentId: string): Promise<number> {
let depth = 0;
let currentId: string | null = commentId;
while (currentId) {
const comment = await this.findById(currentId);
if (!comment || !comment.parentId) {
break;
}
currentId = comment.parentId;
depth++;
}
return depth;
}
This makes multiple queries. I know. But with a max depth of 3 (my default), that's at most 3 queries. And since this only runs when creating a reply (to check if we've hit the limit), it's not on the hot path.
The depth limit is configurable per post via a post_settings table:
export const postSettings = sqliteTable("post_settings", {
id: text("id").primaryKey().$defaultFn(() => randomId("postSettings")),
postSlug: text("post_slug").notNull().unique(),
commentsEnabled: int("comments_enabled").default(1).notNull(),
reactionsEnabled: int("reactions_enabled").default(1).notNull(),
maxNestingDepth: int("max_nesting_depth").default(3).notNull(),
...timestamps,
});
This lets me disable comments on specific posts or change the depth limit for longer discussions.
Querying Comments
For a post's comments, I fetch top-level comments first:
SELECT * FROM comments
WHERE post_slug = ? AND parent_id IS NULL AND is_hidden = 0
ORDER BY created_at DESC
LIMIT 50
Then the client builds the tree structure. Replies are either:
- Fetched eagerly with the parent (for shallow nesting)
- Loaded on demand when expanding a thread (for many replies)
I went with eager loading since my depth limit is shallow. The query joins replies and I build the tree in the controller before sending to the client.
What I'd Do Differently
If I were building this for a high-traffic site, I'd consider:
Caching reply counts: Right now I count replies with a subquery. For posts with hundreds of comments, a denormalized replyCount column would be faster.
Materialized path for deep threads: If I needed unlimited depth, the adjacency list queries would get expensive. A materialized path like /cmt_abc/cmt_def/cmt_ghi would let me query all descendants with a single LIKE clause.
Separate table for deleted content: Storing [Comment deleted] in the content column feels a bit hacky. A separate deleted_comments table with the original content could support features like "view deleted" for admins. But again, I don't need this at the moment.
But for a personal blog with moderate traffic? The current schema works fine.
Up Next
The schema handles storage, but how do comments update in real-time across multiple browser tabs? In next part, I'll cover Server-Sent Events and why they were the right choice over WebSockets.
All Comments 0