If you are not a Medium member, you can read the full article at this link.

TL;DR

This article presents a practical example of instantly extracting the required information from a pile of documents scattered across multiple formats (docx, pdf, ppt, xlsx, images, audio/video recordings) and generating a structured report using a given template with a custom Claude Skill using an MCP server as a tool. You will learn how to quickly set up this Skill locally and integrate it into your routine workflows.

TOC:

· Creating a Transcription MCP Server · Creating the Claude Skill for Documenting Meetings · Running the Skill on a Sample Data · Modifying the Skill for Other Use Cases · Potential Extension

Organizations and knowledge workers frequently face the task of writing structured reports by analyzing information scattered across multiple documents and formats. These documents often include PowerPoint presentations, PDFs, Excel sheets, meeting recordings, emails, web links, and other unstructured or semi-structured sources.

My AI consultancy experience shows an increasing demand for AI-driven report generation through information extraction from multiple documents.

For instance:

  • A legal team may need to review documents from multiple stakeholders to prepare a contract or compliance report.
  • A project manager may need to consolidate workshop materials, notes, and recordings into a structured summary containing decisions, action items, and follow-ups.
  • Procurement teams often compile supplier documents into standardized evaluation reports.
  • HR teams synthesize CVs, interview notes, and feedback into candidate assessments.

In all these use cases, the task is the same: extract relevant information from diverse sources and present it in a clear structure.

The task becomes more laborious and time-consuming when the output must follow a strict template, requiring the reports to adhere to predefined layouts, section names, ordering, and terminology.

The diversity of input data further compounds the effort. Text documents, spreadsheets, scanned files, hand-written notes, sketches/charts, and audio/video recordings each require different handling. As a result, even though this process is routine and repetitive, compiling a single report can take several hours, or even days, of manual work.

These are the kinds of repeatable tasks where AI can deliver significant value, improving productivity and saving a substantial amount of time.

Claude Skills holds a strong potential for automating routine and repetitive tasks. In my previous articles, I demonstrated a step-by-step approach to creating custom Claude Skills and using them in routine workflows.

In this article, I will walk you through an end-to-end example:

Converting a heterogeneous collection of documents across multiple formats into a structured report that follows a given template and/or sample.

We will build a custom Claude Skill that uses both built-in and custom tools to read, analyze, and fuse information from diverse inputs, including Word documents, PowerPoint slides, PDFs, Excel sheets, images, and audio/video files. The skill aggregates the full context, extracts the information relevant to the user's requirements, and generates a structured report, either using a default layout or a user-specified template or reference document.

Because Claude models do not natively transcribe audio or video, we will build a custom Model Context Protocol (MCP) server that the Claude Skill can invoke as a tool. This MCP server will handle transcription for audio/video inputs. For this purpose, I will use our open-source GAIK toolkit's transcriber package, exposed as the gaik-transcriber MCP server.

The complete workflow is shown in the following diagram.

None

The example skill built in this article is use-case agnostic and can be easily adapted for other use cases (no programming required).

The complete code is available in the GitHub repository.

Let's dive in.

Creating a Transcription MCP Server

We will create a transcription MCP server to transcribe audio/video files. This MCP server will be used as a tool in Claude Skill. We will use the transcriber module of our open-source GAIK toolkit (under development) to set up an MCP server.

If your data does not contain audio/video files, you do not need to set up this MCP server on your PC. In that case, please proceed to the next section, "Creating the Claude Skill".

Follow these steps to create a transcription MCP server using the gaik[transcriber] package. This MCP server will be used as a transcription tool in the Claude skill to transcribe audio/video files in the input documents.

Phase 1: Project Setup

First of all, install Claude Desktop and sign in with your Anthropic's credentials. Claude Desktop is required to connect with the transcription MCP server.

Install Node.js (needed for npx / filesystem server). Download and install from the official installer.

Create Project Directory:

mkdir transcription-mcp
cd transcription-mcp

Install dependencies. You can install the whole gaik package or just its transcriber module.

pip install fastmcp gaik[transcriber]
  • fastMCP: Python MCP server framework (FastMCP)
  • gaik[transcriber]: GAIK Transcriber library for audio/video transcription

Phase 2: Write the MCP Server Code

Create server.py. which contains the main MCP server implementation. Here, we are using our GAIK toolkit's Transcriber package. The name given to the MCP server through FastMCP is gaik-transcriber.

Note that the comments in transcribe_audio serve as instructions to the MCP server.

"""MCP Server for GAIK Transcriber"""
import os
import sys
from pathlib import Path
from mcp.server.fastmcp import FastMCP

from gaik.building_blocks.transcriber import Transcriber, get_openai_config

# Initialize MCP server
mcp = FastMCP("gaik-transcriber")

@mcp.tool()
def transcribe_audio(file_path: str, enhanced: bool = False) -> str:
    """
    Transcribe audio/video file using GAIK Transcriber.

    ==== CRITICAL OUTPUT INSTRUCTIONS ====
    You MUST return the transcription EXACTLY as provided by this tool.

    DO NOT:
    - Add any formatting (headers, bullets, bold, markdown)
    - Restructure or reorganize the text
    - Summarize or paraphrase any part
    - Add section labels or titles
    - Add any commentary before or after
    - Change any words or punctuation

    DO:
    - Output the text exactly as returned
    - Preserve the original flow and structure
    =====================================

    Args:
        file_path: Full Windows path to audio/video file
        enhanced: If True, return enhanced transcript (default: False)

    Returns:
        The exact transcription text - output this verbatim with no changes.
    """
    try:
        config = get_openai_config(use_azure=False)

        transcriber = Transcriber(
            api_config=config,
            enhanced_transcript=enhanced,
        )

        result = transcriber.transcribe(
            file_path=Path(file_path),
            custom_context="",
        )

        if enhanced and result.enhanced_transcript:
            return result.enhanced_transcript

        return result.raw_transcript

    except Exception as e:
        import traceback
        error_msg = f"Error: {str(e)}\n\nTraceback:\n{traceback.format_exc()}"
        print(error_msg, file=sys.stderr)
        return error_msg

if __name__ == "__main__":
    mcp.run(transport="stdio")

server.py:

  • Uses GAIK's Transcriber class
  • Uses @mcp.tool() decorator to expose the function as an MCP tool
  • Accepts file_path (string) and enhanced (boolean) parameters
  • Uses stdio transport (standard input/output for communication)

Add an .env file. You will need OpenAI's API key for audio/video transcription.

# OpenAI API key (required if using OpenAI)
OPENAI_API_KEY=your_api_key

# Azure API key (required if using Azure)
AZURE_API_KEY=your_api_key

# Provider type: openai or azure
OPENAI_API_TYPE=openai

Note: You may set up a transcription MCP server with a local Whisper model. In that case, you will not need OpenAI's API key. However, this will require a GPU for faster processing.

At this point, the transcription MCP server is pretty much ready. However, you may want to install ffmpeg. The transcription MCP server is based on GAIK's transcriber package, which uses the Whisper model. For audio/video files exceeding 25MB, chunking will be required, which is implicitly handled by the transcriber package through ffmpeg.

Download FFmpeg from https://ffmpeg.org/download.html and extract it to a folder. Note the path to its binaries (e.g., C:/ffmpeg-8.0.1-essentials_build/ffmpeg-8.0.1-essentials_build/bin) and add it to Windows PATH.

Phase 3: Configure Claude Desktop

Open the Claude Desktop MCP configuration file at: %APPDATA%\Claude\claude_desktop_config.json

An example path could be C:\Users\<YOUR_USERNAME>\AppData\Roaming\Claude\claude_desktop_config.json

If the file doesn't exist, create it.

This file tells Claude Desktop which local MCP servers it is allowed to start and use, and how to start them.

We will configure two MCP servers in the config file: transcription server (located in C:\\path\\to\\whisper-mcp\\server.py), and fileserver, which allows Claude to read/write local files on your system.

{
  "mcpServers": {
    "gaik-transcriber": {
      "command": "python",
      "args": ["C:\\Users\\h02317\\whisper-mcp\\server.py"],

      "timeout": 600000
    },
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "C:\\"
      ]
    }
  }
}

Configuration Breakdown:

  • mcpServers: Declares which MCP servers Claude Desktop should launch (command/args/env per server).
  • gaik-transcriber.command: Runs the transcriber MCP server via Python (local process, typically STDIO transport).
  • gaik-transcriber.args: Absolute Windows path to server.py (the MCP server entrypoint).
  • gaik-transcriber.timeout: 600000 ms (10 minutes) to allow long-running transcription jobs.
  • filesystem.command: Uses npx to run the official filesystem MCP server.
  • filesystem.args: -y avoids prompts; @modelcontextprotocol/server-filesystem is the package; C:\\ is the allowed root path exposed to the tool.

Phase 4: Testing and Debugging

Close Claude Desktop. Also, close it from the system tray or Task Manager. Restart Claude Desktop to load the new MCP server configuration.

Now, you should be able to see the two MCP servers (filesystem and gaik-transcriber) in Claude Desktop.

None

You should also be able to see and configure these tools in Settings → Connectors in Claude Desktop.

None

If there is an error, check the logs at %APPDATA%\Roaming\Claude\logs. In particular, check mcp-server-filesystem.txt and mcp-server-gaik-transcriber.txt log files of the two MCP servers.

Test with an audio or video file. Claude Desktop's native environment is Linux. Since the transcription MCP server runs in Windows, we will have to provide the complete Windows path to the audio/video file.

In Claude Desktop:

Transcribe the file in C:\Users\h02317\Downloads\video.mp4
None

GAIK's transcriber module also has an option to enhance the transcript by creating lines at proper places, creating paragraphs and dialogues, and fixing any spelling or grammatical errors. For this, we can ask the MCP server to enhance the transcript as follows:

Transcribe the file in C:\Users\h02317\Downloads\video.mp4
enhanced: trueArchitecture Overview
None
None

Creating the Claude Skill for Documenting Meetings

We now generate a Skill that takes multiple documents having diverse formats: docx, ppt, pdf, xlsx, images, and audio/video. It uses gaik-transcriber MCP server to process audio/video files, and Claude's built-in skills and tools (pdf, docx, xlsx, pptx, view) to process other document types.

The skill then fuses all the context and considers an optional template and an optional sample document to generate a structured report. The optional template is a blank document with the required structure (title, sub-title, sections, tables, etc.). The skill can directly edit this template to generate the report.

The sample document is a complete, filled report that is provided to follow a specific tone, style, format, and structure.

If no template and/or sample document is provided, the skill uses a predefined structure to generate the report.

As an example, we create a documenting-meeting skill that converts scattered meeting data (recordings, handwritten notes, digital notes, diagrams, sketches, supplementary documents) into a structured MS Word deliverable with a summary, decisions, action items, open questions, and a follow-up message that could be directly copy-pasted into an email to the meeting attendees.

The example skill is not limited to processing only meeting data. It can be easily modified for other use cases where the user has to generate a report from several documents of varying formats.

Here is the skill structure:

documenting-meetings/
├── SKILL.md              # Main skill definition & workflow
├── EVALUATION.md         # Test scenarios & evaluation criteria
└── reference/
    ├── INPUT_FORMATS.md      # Input type handling details
    └── OUTPUT_SECTIONS.md    # Output section guidance

See all the files in the skill package in the GitHub repo.

SKILL.md is the main file, which serves as the entry point, containing the "what to do" instructions Claude uses to decide when and how to run the skill.

In this package, I have kept the core workflow in SKILL.md and split the detailed "how-to" guidance into reference files so the skill stays readable while still being precise.

SKILL.mdstarts with a YAML frontmatter that names the skill (documenting-meetings) and defines what it does. Right after that, the skill explains where it runs best (Claude Desktop) and why: it assumes two external capabilities—an MCP filesystem server for safe folder/file operations and the gaik-transcriber MCP server for transcription.

---
name: documenting-meetings
description: Converts scattered meeting data (recordings, handwritten notes, diagrams, digital notes, supplementary documents) into a structured MS Word deliverable with summary, decisions, action items, open questions, and follow-up message. Use when the user mentions meeting notes, meeting summary, meeting minutes, action items from meeting, meeting documentation, or needs to consolidate meeting materials.
---

# Meeting Documentation

Converts scattered meeting materials—audio/video recordings, handwritten notes, diagrams, digital notes, and supplementary documents—into a single, well-formatted MS Word document containing a concise summary, decisions, action items with owners and due dates, open questions, and a ready-to-send follow-up message.

It is designed to run in **Claude Desktop** with:
- An MCP **filesystem** server (for listing/reading files and folders)
- An MCP **gaik-transcriber** server (for transcribing audio/video recordings)

## When to Use

Use this skill when:
- User mentions "meeting notes", "meeting summary", "meeting minutes", "meeting documentation"
- User needs to consolidate multiple meeting materials into one document
- User asks to extract action items, decisions, or follow-ups from a meeting
- User has meeting recordings, handwritten notes, or other meeting artifacts to process
- User wants a structured deliverable from meeting data

## Inputs

**Required (at least one):**
- Meeting recordings (audio/video files) – transcribed using `gaik-transcriber:transcribe_audio`
- Handwritten notes (scanned images) – interpreted visually
- Digital notes (text files, markdown, etc.)
- Diagrams/sketches/figures (images)
- Comments or notes from multiple people

**Optional:**
- Supplementary documents (PowerPoint slides, PDF guidelines, policy documents, Excel files)
- Output template (blank .docx with predefined headers, sections, logos, etc.)
- Sample output document (.docx or .pdf) defining style, format, tone, and length to follow

**Required parameter:**
- `input_folder`: Path to the main input folder containing the required subfolder structure. If not provided, ask the user to specify it.

**Required folder structure:**
```
<input_folder>/
├── input_documents/     # Required: recordings, photos, notes, presentations, PDFs, etc.
├── templates/           # Optional: blank template with predefined headers, sections, logos
└── sample_documents/    # Optional: sample document defining style, format, tone, length
```

## Tooling Rules (Windows vs Linux Path Safety)

### Why this matters
On Windows, Claude Desktop + toolchains sometimes behave like they are in a POSIX shell, producing paths like `/mnt/c/...`.
Meanwhile, your MCP servers may run **native Windows Python**, expecting `C:\...`.
This mismatch can cause "file not found" or failing shell commands.

### Strict rules
1) **Prefer MCP filesystem tools for file/folder operations**
Use the filesystem server for listing and reading files instead of shell commands.

2) **Avoid bash commands on Windows**
If you must run a command on Windows, prefer **PowerShell**.

3) **When calling gaik-transcriber, prefer Windows drive-letter paths on Windows**
Pass file paths like `C:\Users\...\recording.m4a`.
If you only have a POSIX/WSL path (e.g., `/mnt/c/...`), convert it to a Windows path before calling the transcriber, or rely on the transcriber server's internal normalization (recommended).

4) **Never assume the environment is Linux**
Treat the runtime as OS-ambiguous and enforce the above rules to stay stable.

5) **Never do the following:**
NEVER run pip install, python -c, pdfplumber, or any ad-hoc parsing code for .pdf/.pptx/.xlsx.

NEVER use /mnt/user-data/uploads/... paths; only use paths returned by the MCP filesystem listing or the user-provided Windows folder.

If you are about to do any of the above, STOP and switch to the built-in PDF/PPTX/XLSX skills.

## Workflow

### Step 0 — Collect context
Ask (only if not provided):
- Meeting title or purpose (optional)
- Desired output format (Markdown is default)
- Any special focus: "only action items", "only decisions", "customer-facing summary", etc.

## Step 1 — Validate input folder structure and capabilities

If the user has not specified an input folder path, ask for it and confirm it contains `input_documents/` (required).

### 1) Validate folder structure (MCP filesystem only)
Use the filesystem MCP tool to list:
- `<input_folder>`
- `<input_folder>\input_documents` (required)
- `<input_folder>\templates` (optional)
- `<input_folder>\sample_documents` (optional)

If `input_documents/` is missing or empty, stop and ask the user to add the meeting artifacts there.

### 2) Capability check (prevents Windows/Linux path loops for binaries)
Purpose: decide upfront whether this environment can process **binary** files from a Windows folder without requiring the user to upload them.

Inventory binary files found in:
- `<input_folder>\input_documents`
- `<input_folder>\templates`
- `<input_folder>\sample_documents`

Treat the following as **binary** (not safely readable via text tools):
- `.pdf`, `.pptx`, `.xlsx`, `.docx`

Decision:
- If ANY binary files exist and are ONLY in the Windows folder:
  - Assume you cannot process them directly unless you have a binary-capable tool.
  - The official Node filesystem MCP server supports reading text files and reading image/audio media, but does not guarantee generic binary reads for Office/PDF files. :contentReference[oaicite:2]{index=2}
  - Therefore:
    - If a dedicated MCP document-parser tool is available (recommended), use it for these files.
    - Otherwise, you MUST ask the user to upload/attach these binaries in Claude Desktop to process them with built-in PDF/PPTX/XLSX/DOCX skills.

If the user asks for "local-folder only" processing of PDF/PPTX/XLSX/DOCX:
- Explain that this requires either:
  - a binary-capable filesystem MCP server (supports base64/binary reads), or
  - a Windows-native document-parser MCP server.
  (Example of a filesystem MCP server that explicitly supports binary/base64 reads: `mark3labs/mcp-filesystem-server`.) :contentReference[oaicite:3]{index=3}

Continue with the core workflow (transcription + text notes + images) regardless of the binary handling outcome.

### Step 2 — Inventory input files
From `input_documents/`, create a quick inventory:
- Recordings (audio/video)
- Images (handwritten notes, diagrams)
- Text documents (agenda, minutes draft, emails, etc.)
- PDFs / slides (read text if possible; otherwise summarize)

### Step 3 — Transcribe recordings (gaik-transcriber MCP tool)
For each audio/video file, call:

- `gaik-transcriber:transcribe_audio`
  - `file_path`: full path to the recording
  - `enhanced`: false by default (true only if user asks for enhanced quality)

If transcription fails with "file not found":
- Re-check the path style and ensure Windows drive-letter paths on Windows.

#### Step 4 - Images (Handwritten Notes, Diagrams, Sketches, Figures)
For each image file (.jpg, .jpeg, .png, .gif, .webp, .bmp, .tiff):
2. Interpret the image content (handwritten notes, diagrams, figures)
3. Create a textual description capturing all relevant information

#### Step 5 - Notes
Read files directly (.txt, .md, .rtf).
Use /mnt/skills/public/docx/SKILL.md for reading/writing .docx files.

## Step 6 — Supplementary documents (.pdf, .pptx, .xlsx, .docx)

Goal: extract relevant information from supplementary documents WITHOUT ad-hoc parsing code and WITHOUT Windows/Linux path mismatches.

### Non-negotiables (hard rules)
- NEVER run `cp`, `pip install`, `python -c`, `pdfplumber`, `soffice`, `pandoc`, or any ad-hoc parsing commands to read `.pdf/.pptx/.xlsx/.docx` from Windows paths.
- NEVER assume `C:\...` or `/mnt/c/...` is accessible inside a Linux sandbox.
- NEVER invent upload paths (e.g., `/mnt/user-data/uploads/...`) unless the environment explicitly provides them.
- Do not use the filesystem MCP server to "load built-in skill files." The filesystem server is for user-allowed directories, not Claude's internal skill library. (Use built-in skills directly when attachments are available.) :contentReference[oaicite:4]{index=4}

### Decision tree

A) If the supplementary file is uploaded/attached in Claude Desktop
- Use the corresponding built-in skill:
  - `.pdf` → PDF skill
  - `.pptx` → PPTX skill
  - `.xlsx` → XLSX skill
  - `.docx` → DOCX skill
- Extract only relevant content for the meeting deliverable (decisions, timelines, roadmap items, action items, risks).
- Attribute extracted content by filename.

B) If the supplementary file is ONLY present in the Windows folder (discovered via `filesystem:list_directory`)
1) Text-like files (`.txt`, `.md`, `.csv`, `.json`)
- Read via filesystem `read_text_file` and extract relevant content.

2) Binary files (`.pdf`, `.pptx`, `.xlsx`, `.docx`) — IMPORTANT
- Do NOT attempt conversion or parsing via sandbox tools (pandoc/python/soffice/etc.).
- If a dedicated MCP document-parser tool is available:
  - Call the parser using the Windows path and use returned extracted text/tables in synthesis.
- Otherwise:
  - Ask the user to upload/attach the file(s) in Claude Desktop.
  - Continue processing what you can (transcripts, notes, images) and list the missing binaries under "Missing inputs".

Template + samples:
- If templates/sample documents are `.docx/.pdf/.pptx/.xlsx` and are only on Windows disk:
  - Ask the user to upload them.
  - If not provided, proceed with a clean default Markdown structure.

### Output handling
- If supplementary binaries are unavailable (not uploaded, and no parser tool), clearly list them:
  - Missing inputs: `<filename1>`, `<filename2>`, ...
- Produce the meeting deliverable using available evidence and a default format.
- Do not block the entire workflow just because supplementary binaries are missing.

### Step 7: Fuse Information

Combine all processed inputs into a single consolidated text block with clear separators:

```
=== TRANSCRIPTION: <filename> ===
<transcribed content>

=== HANDWRITTEN NOTES: <filename> ===
<interpreted content>

=== DIGITAL NOTES: <filename> ===
<note content>

=== DIAGRAM/FIGURE: <filename> ===
<description of diagram/figure>

=== SUPPLEMENTARY: <filename> ===
<extracted content>
```
### Step 8: Check for Template and Sample Documents

Check the dedicated subfolders for template and sample:

**Template (`<input_folder>/templates/`):**
- Look for a blank .docx file with predefined structure (headers, sections, logos)
- If multiple files exist, use the first .docx file found.

**Sample (`<input_folder>/sample documents/`):**
- Look for a .docx or .pdf file defining the required style, format, tone, and length
- If multiple files exist, use the first document found

If found:
- For template: Copy it and fill in the content (do not modify structure)
- For sample: STRICTLY follow its format, style, tone, and length

### Step 9: Generate the Deliverable

Read the docx skill before creating the document:
```
view /mnt/skills/public/docx/SKILL.md
```

Then follow the docx skill's "Creating a new Word document" workflow to generate the output.

**If template provided:** Copy the template and fill in sections according to the template structure.

**If no template:** Create a new document using the output format below.

### Step 10: Save and Present

1. Save the document to the `input_documents` folder
2. Use `present_files` to share with the user

## Output Format (FLEXIBLE - adapt if template/sample provided)

When no template or sample is provided, use this structure:

```
MEETING SUMMARY
===============
Date: [extracted or inferred date]
Attendees: [if identifiable from inputs]
Duration: [if available]
---
EXECUTIVE SUMMARY
-----------------
[2-4 paragraph concise summary of the meeting covering main topics discussed, 
key points, and overall outcomes. Keep factual, based only on input content.]
---
DECISIONS MADE
--------------
1. [Decision text]
   - Context: [brief context if available]
   
2. [Decision text]
   - Context: [brief context if available]

[If no decisions found in inputs, OMIT this section entirely]
---
ACTION ITEMS
------------
| # | Action Item | Owner | Due Date | Priority |
|---|-------------|-------|----------|----------|
| 1 | [description] | [name] | [date] | [H/M/L] |
| 2 | [description] | [name] | [date] | [H/M/L] |

[If owner/due date not specified in inputs, mark as "TBD"]
[If no action items found, OMIT this section entirely]
---
OPEN QUESTIONS
--------------
1. [Question that was raised but not resolved]
2. [Question requiring follow-up]

[If no open questions found, OMIT this section entirely]
---
FOLLOW-UP MESSAGE
-----------------
[Ready-to-paste message for email or chat, summarizing key outcomes and next steps. 
Keep professional and concise. Format as:]

Subject: Meeting Follow-up - [Topic/Date]

Hi team,

[1-2 paragraphs summarizing the meeting, key decisions, and action items]

Next steps:
- [Action item 1] - [Owner] by [Date]
- [Action item 2] - [Owner] by [Date]

Please let me know if you have any questions.

Best regards,
[Sender placeholder]
```

## Guardrails

**Do:**
- Extract information faithfully from provided inputs
- Mark uncertain information as "TBD" or "unclear from recording"
- Preserve original terminology and names from the inputs
- STRICTLY follow template/sample format when provided
- Omit sections if no relevant information exists in inputs

**Do NOT:**
- Invent or hallucinate any information not present in inputs
- Add action items, decisions, or attendees not mentioned in source materials
- Make assumptions about dates, owners, or deadlines not explicitly stated
- Include sections in the deliverable if the information is not in the inputs

**If information is missing:**
- For required fields: Mark as "TBD" or "Not specified in meeting materials"
- For entire sections: Omit the section from the deliverable
- If critical inputs are missing: Inform the user what additional materials would help

**Error handling:**
- If transcription fails: Report the error and continue with other inputs
- If a file cannot be parsed: Log the issue and proceed with remaining files
- If no usable inputs found: Ask the user to verify the folder path and file formats

## Examples

### Example 1: Standard Meeting with Recording and Notes

**User prompt:**
"Process my meeting materials from /home/user/meetings/q4-planning and create a summary document"

**Expected folder structure:**
```
/home/user/meetings/q4-planning/
├── input_documents/
│   ├── meeting-recording.mp4
│   ├── whiteboard-photo.jpg
│   └── my-notes.txt
├── templates/           # (empty or absent)
└── sample_documents/    # (empty or absent)
```

**Expected behavior:**
1. Validates folder structure, finds input_documents/ with 3 files
2. Transcribes recording using gaik-transcriber
3. Interprets whiteboard photo
4. Reads digital notes
5. Fuses all content with separators
6. Generates Word document with all applicable sections (no template/sample)
7. Saves to outputs and presents to user

### Example 2: With Template and Sample

**User prompt:**
"Create meeting minutes from the files in /meetings/standup using our company template"

**Expected folder structure:**
```
/meetings/standup/
├── input_documents/
│   ├── recording.m4a
│   └── notes.txt
├── templates/
│   └── company-template.docx
└── sample_documents/
    └── sample-minutes.docx
```

**Expected behavior:**
1. Finds template in templates/ subfolder
2. Finds sample in sample_documents/ subfolder
3. Processes all materials in input_documents/
4. Copies template and fills in content following sample's style
5. Presents formatted document

### Example 3: Minimal Inputs

**User prompt:**
"I just have a voice memo from our call - can you turn it into meeting notes? The folder is /recordings/client-call"

**Expected folder structure:**
```
/recordings/client-call/
├── input_documents/
│   └── voice-memo.m4a
├── templates/           # (empty or absent)
└── sample_documents/    # (empty or absent)
```

**Expected behavior:**
1. Validates structure, finds single audio file in input_documents/
2. Transcribes the audio file
3. Generates deliverable with available sections only
4. Omits sections where no information exists
5. Notes in follow-up message that details may need verification

## References
Following these reference documents for detailed handing for each input file type, and guidance on each deliverable section. 
- `reference/INPUT_FORMATS.md` – Detailed handling for each input file type
- `reference/OUTPUT_SECTIONS.md` – Guidance on each deliverable section

Step 0 is a quick context pass to check the essentials. Step 1 is the reliability step to validate the required folder structure using MCP filesystem listing.

Step 2 determines what is actually present (recordings, images, notes, supplementary docs). Step 3 transcribes audio/video files using gaik-transcriber:transcribe_audio, with a default of enhanced: false.

Steps 4 and 5 handle images and text notes. Images are interpreted visually into structured text, while .txt/.md/.rtf notes are read directly.

Step 6 blocks unsafe approaches (no pip install, no python -c, no soffice/pandoc hacks, no guessing /mnt/... paths) and chooses between (A) using proper document handling when files are attached, (B) using MCP for text-like formats, or (C) listing missing binaries when they're not accessible.

Once all usable inputs are processed, Step 7 fuses everything into one consolidated evidence block with clear separators by filename (transcripts, handwritten notes, digital notes, diagrams, supplementary extracts).

Steps 8–9 then switch from extraction to generation. The skill checks for an optional template and sample document, follows them strictly if provided, and otherwise produces a clean default document structure. Finally, Step 10 presents the generated file.

The skill has three more supporting files that act as "appendices" to keep SKILL.md lean while still making behavior consistent. reference/INPUT_FORMATS.md documents the handling rules per input type (supported audio/video extensions, the exact transcription call, image interpretation expectations, and how to treat supplementary docs and templates/samples).

reference/OUTPUT_SECTIONS.md defines what each output section means, when it should appear, and—crucially—when it must be omitted (e.g., if there are no explicit decisions, the "Decisions" section is removed rather than filled with placeholders).

And EVALUATION.md provides test prompts and pass/fail criteria that I can use to validate the skill in realistic scenarios (multi-input meetings, template+sample runs, audio-only meetings, error recovery, and "missing folder path" cases).

For using the skill in Claude Desktop, create a .zip file of the skill folder and upload it to Settings → Capabilities by clicking on +Add.

Running the Skill on a Sample Data

I ran the skill on sample documents (generated by Claude Opus 4.5). These documents are in the same folder structure as required by the skill. The skill requires at least one input document to process.

input_folder/
├── input_documents/                      # Required: Meeting artifacts to process
│   ├── deployment-freeze-policy.pdf      # Holiday freeze policy (Dec 23 - Jan 2), on-call schedule, exception rules
│   ├── notes.txt                         # Attendee's handwritten notes with action items, decisions, key takeaways
│   ├── project-budget.xlsx               # Q3 budget allocation & spending across 5 project categories
│   ├── roadmap-presentation.pptx         # 2-slide deck: title slide + project status overview (Mobile, Dashboard, API)
│   └── sketch.png                        # Whiteboard timeline diagram showing Q3/Q4 milestones & key decisions
├── sample_documents/                     # Optional: Reference for output style/format
│   └── sample-meeting-minutes.docx       # Example meeting minutes (Q2 Planning) defining tone, structure, length
└── templates/                            # Optional: Blank template with predefined sections
    └── meeting-template.docx             # Template with placeholders for summary, decisions, actions, follow-up

Apart from these documents, a sample audio file (.mp3) of the meeting recorded is separately provided to the skill. This is done to avoid the file path mismatch between Windows and Linux environment, as the uploaded files reside in Claude Desktop's native Linux environment; whereas, the transcription MCP server runs in the Windows environment and the audio/video files are located using the fileserver MCP.

The sample data can be found in the GitHub repo.

Create a .zip of the the input documents and upload it to Claude Desktop.

The prompt in Claude Desktop to process the data:

Process the documents in the uploaded folder using documenting-meetings skill. 
In addition to these documents, read one more audio file from local drive C:\Users\h02317\Downloads\meeting_recording.mp3.

The skill processes the data as shown in the following snapshot that demonstrate the whole agentic workflow.

None
None

The skill follows the sample document sample-meeting-minutes.docx to generate the output document on the given template meeting-template.docx.

Here is the snapshot of the generated report. It's length, style, format and tone perfectly follows the given template and the sample document.

None

Modifying the Skill for Other Use Cases

The example skill created in this article demonstrates how multiple documents of varying formats can be converted into a structured report on a desired format.

This skill can be easily adapted to other use cases.

In Skill.md, You may need to change name, description, the input discovery & preprocessing steps (what folder(s) to scan, and what file types matter), and the output construction steps (the section structure, tone, and where/how the final file is written). In addition, step 7 and step 10 also need to be modified.

Next, update the two reference files that encode most of the "business rules" so you don't have to rewrite the whole workflow. If your new use case accepts different media (e.g., webinar recordings, interviews, customer calls, classroom lectures), adjust INPUT_FORMATS.md. This is where you define supported formats, what to do for each type, and what to skip or flag.

If your new use case needs a different deliverable (e.g., incident report, sales call summary, compliance memo, project status update), edit OUTPUT_SECTIONS.md. This is where you redefine the section schema, what each section means, and what to omit when evidence is missing.

Finally, treat EVALUATION.md as your safety net: whenever you change inputs, tools, or the output schema, update the test cases so you can quickly verify the skill still behaves correctly (especially around missing files, partial inputs, or tool failures).

Potential Extension

The skill can be adapted to several use cases and improved in a number of ways:

  • Turn the skill into an application workflow with the Claude Agent SDK by running the same skill package through an SDK agent that can invoke Skill, filesystem tools, and your MCP tools programmatically.
  • Extend the skill for a generation-revision-correction workflow where the generated report is checked and corrected if required. This may happen in multiple rounds until the report passes the validation criterion.
  • Explore other use cases, such as resume analysis and improvement against a template, sales proposals from customer + company documents, or compliance documents synthesized from internal policies and evidence files.

The complete code is available in the GitHub repository.

Thanks for reading! If this article was helpful, I would appreciate a few claps 👏 and would love to hear your thoughts in the comments.

Please follow me on Medium and LinkedIn, and visit my website.