logo

{ TECH BLOG }

Full Stack AI — Integrating LLMs into React + Node.js Applications

01-Mar-2026By Mamina Suman


The Full Stack AI Stack

Building AI-powered applications is no longer just about calling an API. Modern full-stack AI applications require thoughtful architecture across the frontend, backend, and infrastructure layers. Here is a practical guide to integrating LLMs into a React + Node.js application.

Backend: Node.js + AI SDK


// Using Vercel AI SDK with Express
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import express from 'express';

const app = express();

app.post('/api/chat', async (req, res) => {
  const { messages } = req.body;
  
  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    system: 'You are a helpful assistant.',
  });

  // Stream the response
  result.pipeDataStreamToResponse(res);
});

The Vercel AI SDK provides a unified interface for streaming LLM responses to the frontend. It supports OpenAI, Anthropic, Google, and other providers through a consistent API.

Frontend: React + Streaming UI


// React component with streaming AI responses
'use client';
import { useChat } from 'ai/react';

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div>
      {messages.map(m => (
        <div key={m.id} className={m.role === 'user' ? 'user-msg' : 'ai-msg'}>
          {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} placeholder="Ask anything..." />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

Key Architecture Decisions

  • Streaming vs. Batch — Use streaming for chat interfaces (better UX), batch for background processing
  • Function Calling — Let the LLM invoke backend functions (search, database queries, API calls) through structured tool use
  • Context Management — Implement conversation history with token-aware truncation
  • Error Handling — Gracefully handle rate limits, timeouts, and model errors with retry logic
  • Cost Control — Track token usage per user, implement budgets, and cache common responses
  • Security — Never expose API keys to the frontend. Sanitize user inputs. Implement output filtering.

Production Checklist

Before launching an AI feature: implement rate limiting per user, add content moderation, set up cost monitoring dashboards, create fallback responses for when the LLM is unavailable, write integration tests with recorded LLM responses, and establish a prompt management system for versioning and A/B testing.

If you have any questions or suggestions for this blog, please leave a comment below. I will get back to you ASAP. For contacting me please use the site's Contact form or you can directly mail me [email protected].

If you have any project or technical challenge on your mind, please be in touch with me here.
For my recent work please visit the portfolio section.