Modalities (Multi-Modal)

Tekimax organizes AI capabilities into four distinct modalities. Each modality maps to a namespace on the client, so IDE auto-complete always shows the right methods. Using strict Capability Interfaces (VisionCapability, ImageGenerationCapability, etc.), the SDK knows exactly what your provider supports at compile time. Unsupported calls will instantly highlight as a TypeScript error!

Text

Chat, Completions, and Embeddings

Images

Generation, Editing, and Vision

Audio

Text-to-Speech (TTS) and Transcription (STT)

Video

Generation and Analysis

Text (Chat)

The client.text namespace handles all Language Model interactions.

Code

import { Tekimax, OpenAIProvider } from 'tekimax-ts';

const client = new Tekimax({ provider: new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY! }) });

const response = await client.text.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
});

// ChatResult has a flat shape — access .message directly.
console.log(response.message.content);

Images

The client.images namespace unifies image generation (e.g., DALL-E) and editing.

Image Generation

Code

// Generate an image
const result = await client.images.generate({
  prompt: 'A futuristic city with flying cars, cyberpunk style',
  model: 'dall-e-3',
  size: '1024x1024'
});

console.log(result.data[0].url);

Image Analysis (Vision)

Analyze images using multi-modal models like GPT-4o or Claude 3.5 Sonnet. The SDK normalizes the request format — OpenAI uses image_url content parts, Anthropic uses image source blocks, and Gemini uses inlineData — but you always call the same method.

Code

const analysis = await client.images.analyze({
    model: 'gpt-4o',
    image: 'https://example.com/chart.png', // URL or Base64
    prompt: 'Extract the data from this chart.'
});

// ImageAnalysisResult returns .content directly (not wrapped in .message).
console.log(analysis.content);

Audio

The client.audio namespace provides Text-to-Speech (TTS) and Transcription (STT) capabilities.

Text-to-Speech

Code

import { Tekimax, OpenAIProvider } from 'tekimax-ts';

const client = new Tekimax({
    provider: new OpenAIProvider({ apiKey: process.env.OPENAI_API_KEY! })
});

// Convert text to speech
const audio = await client.audio.speak({
  model: 'tts-1',
  input: 'Hello, this is a generated voice.',
  // "alloy" is a neutral, balanced voice — good for demos.
  // Other options: echo, fable, onyx, nova, shimmer.
  voice: 'alloy',
});

// Returns an ArrayBuffer — write to a file or pipe to an audio player.
console.log(`Generated ${audio.buffer.byteLength} bytes of audio`);

Audio Transcription (Speech-to-Text)

Code

import fs from 'node:fs';

// Transcribe audio to text using Whisper
// TranscriptionOptions.file accepts File, Blob, or Buffer.
const transcription = await client.audio.transcribe({
  file: fs.readFileSync('recording.mp3'),
  model: 'whisper-1',
  // 'verbose_json' returns timestamps and segment-level data,
  // which is useful for subtitle generation or word-level alignment.
  response_format: 'verbose_json',
});

console.log(transcription.text);
console.log(transcription.segments); // [{ start, end, text }, ...]

Video

The client.videos namespace handles video generation and analysis.

Video Generation

Code

// Generate a video from a prompt
const video = await client.videos.generate({
  prompt: 'A cat running in a field of sunflowers',
  model: 'luma-dream-machine',
});

console.log(video.data[0].url);

Video Analysis (Gemini)

We support native video analysis using Gemini's multi-modal capabilities. Gemini is currently the only provider with built-in video understanding — the SDK downloads the video and converts it to inlineData automatically.

Code

import { Tekimax, GeminiProvider } from 'tekimax-ts';

const client = new Tekimax({
    provider: new GeminiProvider({ apiKey: process.env.GOOGLE_API_KEY! })
});

const analysis = await client.videos.analyze({
    video: 'https://cdn.example.com/beach_sunset.mp4',
    // gemini-1.5-flash is preferred for video analysis because it has
    // a 1M token context window at lower cost than gemini-1.5-pro.
    model: 'gemini-1.5-flash',
    prompt: 'Describe the scene and the lighting conditions.'
});

console.log(analysis.content);

Text

Images

Audio

Video

Text (Chat)

Images

Image Generation

Image Analysis (Vision)

Audio

Text-to-Speech

Audio Transcription (Speech-to-Text)

Video

Video Generation

Video Analysis (Gemini)

On this page