HumanOnlyWeb

🍁 writer of code. drinker of coffee. human on the web.

Building a Commenting System [Part 2]: Designing the Database for Nested Comments

By HumanOnlyWeb

Nested comments sound simple. Store a parentId, query by it, done. Then you try to paginate a threaded discussion and realize it's not that straightforward.

The Challenge

I wanted comments that support:

  • Nested replies (Reddit-style threading)
  • Configurable depth limits (no infinitely nested threads)
  • Pagination for posts with many comments
  • Soft deletes that preserve thread context
  • Reactions with one-per-user limits

The tricky part isn't storing the data. It's querying it efficiently while keeping the schema simple.

Choosing a Tree Structure

There are three common ways to model hierarchical data in SQL (I only found out about these 2 weeks ago when I started this blog):

Adjacency List (what I chose): Each row stores a parentId pointing to its parent. Simple to write, but you need multiple queries or recursion to get the full tree.

Nested Sets: Each row stores left and right values that define its position in a preorder traversal. Fast reads, but writes are expensive because inserting a node requires updating many rows.

Materialized Path: Each row stores its full path as a string like /1/5/12/. Easy to query descendants, but paths can get long and updates require string manipulation.

I went with adjacency list because:

  1. Writes are simple (just insert with a parentId)
  2. I'm not building Reddit-scale nesting
  3. I can control depth on the server to avoid deep recursion
  4. It's the most intuitive to work with
  5. Last and most honestly, this is the only pattern I've known and used in the past 😅

The downside is that I can't get the entire tree in a single query. But since I'm limiting nesting depth anyway, this hasn't been a problem.

The Schema

Here's the core of what I ended up with, using Drizzle ORM:

// server/database/schema.ts
export const comments = sqliteTable(
  "comments",
  {
    id: text("id")
      .primaryKey()
      .$defaultFn(() => randomId("comments")),
    postSlug: text("post_slug").notNull(),
    userId: text("user_id")
      .notNull()
      .references(() => users.id),
    parentId: text("parent_id"),
    content: text("content").notNull(),
    isEdited: int("is_edited").default(0).notNull(),
    isDeleted: int("is_deleted").default(0).notNull(),
    isHidden: int("is_hidden").default(0).notNull(),
    ...timestamps,
  },
  (table) => [
    index("idx_comments_post_slug").on(table.postSlug),
    index("idx_comments_parent_id").on(table.parentId),
    index("idx_comments_user_id").on(table.userId),
  ],
);

A few things worth noting:

postSlug instead of postId: I don't have a posts table. Blog posts are markdown files (via Nuxt Content), so I use the slug as the identifier. The index on postSlug makes filtering by post fast.

parentId is nullable: Top-level comments have null for parentId. Replies point to their parent comment.

Three boolean flags: isEdited, isDeleted, and isHidden track different states. More on these later.

Three indexes: One for filtering by post, one for finding replies to a comment, one for user lookups (useful for profile pages or when blocking users).

The Self-Referential Relationship

Drizzle needs to know about the parent-child relationship for its query builder:

export const commentsRelations = relations(comments, ({ one, many }) => ({
  user: one(users, {
    fields: [comments.userId],
    references: [users.id],
  }),
  parent: one(comments, {
    fields: [comments.parentId],
    references: [comments.id],
    relationName: "commentReplies",
  }),
  replies: many(comments, {
    relationName: "commentReplies",
  }),
}));

The relationName ties the two sides together. Now I can query a comment and include its replies, or query a comment and include its parent.

Reactions: One Per User

Reactions are stored in a separate table with a unique constraint:

export const reactions = sqliteTable(
  "reactions",
  {
    id: text("id")
      .primaryKey()
      .$defaultFn(() => randomId("reactions")),
    commentId: text("comment_id")
      .notNull()
      .references(() => comments.id, { onDelete: "cascade" }),
    userId: text("user_id")
      .notNull()
      .references(() => users.id),
    type: text("type").default("thumbs_up").notNull(),
    createdAt,
  },
  (table) => [
    index("idx_reactions_comment_id").on(table.commentId),
    uniqueIndex("idx_reactions_unique").on(table.commentId, table.userId, table.type),
  ],
);

The unique index on (commentId, userId, type) prevents duplicate reactions. If a user has already reacted to a comment, reacting again will either update or remove their existing reaction.

I included type in the constraint because I might add more reaction types later (sad, angry, etc.). For now it's always "thumbs_up", but the schema is ready to expand.

The onDelete: "cascade" means when a comment is hard-deleted, its reactions are automatically cleaned up.

Soft Deletes vs. Hard Deletes

This was a design decision I went back and forth on. When a user deletes their comment, should it:

  1. Hard delete: Remove the row entirely
  2. Soft delete: Keep the row but mark it as deleted

I went with soft delete for user-initiated deletions. Here's why:

If someone replies to a comment and then the original gets deleted, hard deletion would orphan those replies. The thread structure breaks down.

Instead, soft deletion replaces the content with a placeholder:

async softDelete(id: string): Promise<Comment | undefined> {
  const [deleted] = await this.db
    .update(comments)
    .set({
      isDeleted: 1,
      content: "[Comment deleted]",
      updatedAt: new Date(),
    })
    .where(eq(comments.id, id))
    .returning();

  return deleted;
}

The comment still exists in the tree, replies still have a valid parent, and users can see that something was there. This matches how Reddit and most forums handle it.

For admin actions (removing spam, for example), I do have a hard delete:

async hardDelete(id: string): Promise<void> {
  await this.db.delete(comments).where(eq(comments.id, id));
}

This completely removes the comment and cascades to delete its reactions. I use this occassionaly, mainly for obvious spam that adds no value to the thread.

The Depth Problem

I wanted to limit how deep nesting can go. Infinite nesting leads to unreadable threads and UI headaches on mobile.

My first instinct was to store depth as a column. But that creates problems:

  • You have to calculate depth on insert
  • If you ever move a comment (unlikely, but possible), you'd need to update all descendants
  • It's denormalized data that can get out of sync

Instead, I calculate depth on the fly by walking up the parent chain:

async getDepth(commentId: string): Promise<number> {
  let depth = 0;
  let currentId: string | null = commentId;

  while (currentId) {
    const comment = await this.findById(currentId);

    if (!comment || !comment.parentId) {
      break;
    }
    currentId = comment.parentId;
    depth++;
  }

  return depth;
}

This makes multiple queries. I know. But with a max depth of 3 (my default), that's at most 3 queries. And since this only runs when creating a reply (to check if we've hit the limit), it's not on the hot path.

The depth limit is configurable per post via a post_settings table:

export const postSettings = sqliteTable("post_settings", {
  id: text("id").primaryKey().$defaultFn(() => randomId("postSettings")),
  postSlug: text("post_slug").notNull().unique(),
  commentsEnabled: int("comments_enabled").default(1).notNull(),
  reactionsEnabled: int("reactions_enabled").default(1).notNull(),
  maxNestingDepth: int("max_nesting_depth").default(3).notNull(),
  ...timestamps,
});

This lets me disable comments on specific posts or change the depth limit for longer discussions.

Querying Comments

For a post's comments, I fetch top-level comments first:

SELECT * FROM comments
WHERE post_slug = ? AND parent_id IS NULL AND is_hidden = 0
ORDER BY created_at DESC
LIMIT 50

Then the client builds the tree structure. Replies are either:

  1. Fetched eagerly with the parent (for shallow nesting)
  2. Loaded on demand when expanding a thread (for many replies)

I went with eager loading since my depth limit is shallow. The query joins replies and I build the tree in the controller before sending to the client.

What I'd Do Differently

If I were building this for a high-traffic site, I'd consider:

Caching reply counts: Right now I count replies with a subquery. For posts with hundreds of comments, a denormalized replyCount column would be faster.

Materialized path for deep threads: If I needed unlimited depth, the adjacency list queries would get expensive. A materialized path like /cmt_abc/cmt_def/cmt_ghi would let me query all descendants with a single LIKE clause.

Separate table for deleted content: Storing [Comment deleted] in the content column feels a bit hacky. A separate deleted_comments table with the original content could support features like "view deleted" for admins. But again, I don't need this at the moment.

But for a personal blog with moderate traffic? The current schema works fine.

Up Next

The schema handles storage, but how do comments update in real-time across multiple browser tabs? In next part, I'll cover Server-Sent Events and why they were the right choice over WebSockets.

All Comments 0

Be the first to comment.