ai-multimodal by mrgoonie

Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.

Coding

1.3K Stars

249 Forks

Updated Dec 30, 2025, 02:08 PM

Why Use This

This skill provides specialized capabilities for mrgoonie's codebase.

Use Cases

Developing new features in the mrgoonie repository
Refactoring existing code to follow mrgoonie standards
Understanding and working with mrgoonie's codebase structure

Install Guide

2 steps

1

Download Ananke

Skip this step if Ananke is already installed.
2

Install inside Ananke

Click Install Skill, paste the link below, then press Install.

https://github.com/mrgoonie/claudekit-skills/tree/main/.claude/skills/ai-multimodal

Skill Snapshot

Auto scan of skill assets. Informational only.

Valid SKILL.md

Checks against SKILL.md specification

Source & Community

Repository claudekit-skills

Skill Version

main

Community

1.3K 249

Updated At Dec 30, 2025, 02:08 PM

Skill Stats

SKILL.md 358 Lines

Total Files 1

Total Size 0 B

License MIT

Source

GitHub Repository ↗ Commit main ↗ skill.extrachatgpt.com ↗