During Development

How to Connect My App to an AI API

Step-by-step guide to connecting your application to OpenAI, Claude, or Gemini APIs, covering SDK setup, streaming responses, error handling, and cost management.

20+ Engineers40+ Products15-Day DeliveryFrom $8,000

The Short Answer

Install the provider's official SDK, store your API key securely as an environment variable, make API calls from your server (never the client), and handle streaming responses for real-time user experiences. The setup is straightforward -- most developers can make their first successful API call within 30 minutes. The real work is in prompt engineering, error handling, and cost management for production use.

SDK Setup for OpenAI, Claude, and Gemini

Each major AI provider offers official SDKs that handle authentication, request formatting, and response parsing.

OpenAI SDK setup:

Install openai via npm. Initialize the client with your API key from an environment variable. The Chat Completions endpoint is your primary interface -- pass a system message to set behavior and user messages for the actual query. The SDK handles retries, timeouts, and response typing automatically.

OpenAI's function calling feature is essential for structured output. Define functions with JSON Schema parameters, and the model returns structured JSON instead of free-form text. This is how you extract data, trigger actions, or build tool-using agents reliably.

Claude (Anthropic) SDK setup:

Install @anthropic-ai/sdk. The Messages API is similar in structure to OpenAI but supports longer context windows (up to 200K tokens) and excels at following detailed instructions. Claude uses a system parameter separate from the messages array, which is a minor but important API difference.

Claude's extended thinking feature lets you enable chain-of-thought reasoning for complex problems, with the model showing its work before providing the final answer.

Gemini (Google) SDK setup:

Install @google/generative-ai. Gemini offers strong multimodal capabilities -- you can pass images, audio, and video alongside text in a single request. Gemini 2.5 Pro is competitive with GPT-4o and Claude Sonnet on most benchmarks and offers generous free-tier pricing.

Streaming Responses and Real-Time UX

Streaming is non-negotiable for any user-facing AI feature. Without streaming, users stare at a loading spinner for 3-10 seconds. With streaming, they see text appear word by word within milliseconds.

Server-side streaming: All three SDKs support streaming. Instead of awaiting the full response, you iterate over chunks as they arrive. Each chunk contains a small piece of the response (typically a few tokens).

Client-side rendering: Use Server-Sent Events (SSE) or the Vercel AI SDK's useChat hook to stream from your API route to the browser. The Vercel AI SDK provides framework-level abstractions that handle streaming, message history, loading states, and error recovery across React, Next.js, and other frameworks.

Handling MCP (Model Context Protocol): MCP is an emerging standard that allows AI models to interact with external tools and data sources through a unified protocol. Instead of building custom function calling logic for each integration, MCP provides a standardized way for your app to expose capabilities (database queries, API calls, file operations) that AI models can invoke. This is particularly powerful for building AI agents that need to interact with multiple systems.

Error Handling and Cost Management

Production AI integrations need robust error handling that tutorials rarely cover.

Common failure modes:

  • Rate limiting (429 errors): Implement exponential backoff with jitter. The OpenAI SDK handles this automatically; other SDKs may need manual implementation.
  • Context length exceeded: Track token count before sending requests. Use tiktoken for OpenAI or estimate at 4 characters per token. Truncate or summarize older messages to stay within limits.
  • Timeouts: Set reasonable timeouts (30-60 seconds for non-streaming, longer for complex reasoning). Provide user feedback during long-running requests.
  • Content filtering: Both OpenAI and Anthropic may refuse requests that trigger safety filters. Handle these gracefully with user-facing messages.

Cost management strategies:

  • Route simple tasks to cheaper models (GPT-4o-mini at $0.15/1M input tokens vs GPT-4o at $2.50/1M)
  • Cache identical or near-identical queries with a TTL
  • Set per-user rate limits to prevent abuse
  • Use max_tokens to cap response length
  • Monitor daily and monthly spend through provider dashboards and set alerts
  • Consider prompt caching features (available on both Anthropic and OpenAI) for repeated system prompts

Tracking costs: Log every API call with model used, input tokens, output tokens, and total cost. Build a simple dashboard or use a service like Helicone or LangSmith to monitor spend per feature, per user, and per model.

How UniqueSide Can Help

UniqueSide has connected 40+ products to AI APIs including OpenAI, Anthropic, Gemini, and custom MCP integrations. We build production-ready AI features with proper streaming, error handling, cost controls, and fallback strategies -- not just a working API call that breaks under real traffic.

Our MVP development services at $8,000 with 15-day delivery include full AI API integration with prompt engineering, model selection guidance, and monitoring setup.

Frequently Asked Questions

Which AI API should I choose for my project?

For most text-based features, start with OpenAI's GPT-4o or Anthropic's Claude Sonnet -- both deliver excellent quality. Use Gemini if you need strong multimodal capabilities (image/video understanding) or want lower pricing. Test your specific use case with 2-3 providers before committing. Switching later is straightforward since the APIs are structurally similar.

Can I call AI APIs directly from the browser?

Never expose your API key in client-side code. Always route AI API calls through your own server or serverless function. This protects your API key, lets you enforce rate limits, log usage, and add caching. The only exception is if the provider offers a client-side SDK with restricted keys, which is rare.

How do I handle AI API latency for good UX?

Stream responses to show output incrementally, display a typing indicator or skeleton UI during generation, use optimistic UI patterns where possible, and consider pre-generating common responses. For non-interactive features (email drafts, summarization), process in the background and notify the user when complete.

Trusted by founders at

Scarlett PandaPeerThroughScreenplayerAskDocsValidateMySaaSCraftMyPDFMyZone AIAcme StudioVaga AI

Awesome work on the JSONmode.com landing page! Very responsive and professional — looking forward to collaborating again in the future.

Adams Briscoe

Founder

Need help building your product?

We ship MVPs in 15 days. Tell us what you're building.

Start Your Project