Building an AI-Powered Podcast Generator with Gemini and Cloud Run

The world of AI is evolving too fast, I wanted to get a quick audio summary of the latest AI news that i can listen in the shower, or while having breakfast, or exercising. By combining Gemini 2.5 Pro model with Cloud Text-to-Speech and Cloud Run, and with the precious help of Gemini CLI, i was able to build a system that transforms RSS feeds into podcast-like audio briefings while i sleep :-)

In this article, I'll walk you through how to create a podcast generator that uses AI for content summarization and natural language processing. We'll explore how Gemini creates conversational summaries, how Text-to-Speech produces broadcast-quality audio, and how Cloud Run handles the entire automated pipeline.

You can find the complete code for this project on GitHub: https://github.com/ggalloro/ai-news including terraform code and instructions to deploy it to your own Google Cloud project.

The Architecture: An AI-Driven Serverless Pipeline

The application is built around intelligent content processing and automated audio generation. Here's how the components work together:

Core AI Pipeline:

Cloud Scheduler: Daily automated trigger for content generation
Cloud Run Job: Serverless Python processing with AI integration
Gemini 2.5 Pro API: Advanced AI for intelligent, conversational summarization
Text-to-Speech API: High-quality neural voice synthesis
Cloud Storage: Reliable storage for generated audio files
Cloud Run Service: Lightweight web interface for accessing episodes
Secret Manager: Secure storage for API keys

The AI Processing Flow:

RSS Aggregation: Fetch latest articles from curated AI news sources
Intelligent Filtering: Select balanced content from multiple sources
AI Summarization: Gemini transforms technical articles into conversational summaries
Audio Generation: Text-to-Speech creates natural, podcast-quality audio
Audio Assembly: Programmatic stitching with intro/outro for polished result

How the Backend Works

The backend is a Python script designed to run as a Cloud Run Job. It performs a sequence of tasks to create the daily podcast.

Step 1: Fetching and Balancing the News

The first step is to gather the source material. The script fetches articles from a predefined list of RSS feeds. To ensure the podcast only contains new content, it first checks a last_processed_entries.json file in the Cloud Storage bucket. This file stores the timestamp of the last article processed for each feed. Only articles published after that timestamp will be considered.

This is the list of the feeds currently in the code, you can change this to whatever you want, even non AI related content :-)

RSS_FEEDS = [
    "https://deepmind.google/blog/rss.xml",
    "https://raw.githubusercontent.com/Olshansk/rss-feeds/main/feeds/feed_anthropic_news.xml",
    "https://openai.com/blog/rss.xml",
    "https://simonwillison.net/atom/everything/"
]

RSS_FEEDS = [
    "https://deepmind.google/blog/rss.xml",
    "https://raw.githubusercontent.com/Olshansk/rss-feeds/main/feeds/feed_anthropic_news.xml",
    "https://openai.com/blog/rss.xml",
    "https://simonwillison.net/atom/everything/"
]

To ensure the podcast is varied and doesn't just feature the most frequent publisher, the logic then intelligently selects the three most recent new articles from each feed and sorts the combined list chronologically. After processing, the job updates the timestamp file, ensuring no article is ever processed twice.

def get_new_rss_entries(feed_urls, last_times):
    """
    Fetches new entries from a list of RSS feeds, ensuring a balanced selection.
    """
    all_new_entries = []
    latest_times = last_times.copy()
    # ... (headers and other setup)

    for url in feed_urls:
        # ... (fetches and parses the feed)

        # Sort entries for this feed by date and take the most recent 3
        feed_entries.sort(key=lambda e: e.published_parsed, reverse=True)
        all_new_entries.extend(feed_entries[:3])

    # Sort all collected entries by date to ensure a chronological podcast
    all_new_entries.sort(key=lambda e: e.published_parsed)
    return all_new_entries, latest_times

def get_new_rss_entries(feed_urls, last_times):
    """
    Fetches new entries from a list of RSS feeds, ensuring a balanced selection.
    """
    all_new_entries = []
    latest_times = last_times.copy()
    # ... (headers and other setup)

    for url in feed_urls:
        # ... (fetches and parses the feed)

        # Sort entries for this feed by date and take the most recent 3
        feed_entries.sort(key=lambda e: e.published_parsed, reverse=True)
        all_new_entries.extend(feed_entries[:3])

    # Sort all collected entries by date to ensure a chronological podcast
    all_new_entries.sort(key=lambda e: e.published_parsed)
    return all_new_entries, latest_times

Step 2: Intelligent Summarization with Gemini

With the articles collected, the next step is to create the podcast script. For each article, we call the Gemini 2.5 Pro model. The key to getting a high-quality, listenable summary is the prompt. We instruct the model to act as a podcast host, which guides it to produce text that is conversational, engaging, and free of any formatting artifacts that would sound robotic when converted to speech.

def summarize_entries(entries, api_key):
    """Summarizes each RSS entry individually in English."""
    client = genai.Client(api_key=api_key)
    individual_summaries = []

    for entry in entries:
        content = entry.get('content', [{}])[0].get('value', entry.get('summary', ''))
        prompt = f"""
        Your role is a professional podcast host writing a script for an English-language audio briefing on Artificial Intelligence.
        Your task is to summarize the following article.

        Guidelines for the summary:
        - Write in a natural, conversational, and engaging podcast style.
        - The output must be a clean paragraph of plain text.
        - It must be suitable for direct text-to-speech conversion.

        **CRITICAL INSTRUCTIONS:**
        - **DO NOT** use any Markdown formatting.
        - **DO NOT** begin with conversational filler like "Of course, here is a summary...".
        - **DO NOT** announce what you are doing. Just provide the summary directly.

        Article to summarize:
        Title: {entry.title}
        Content: {content}
        """
        response = client.models.generate_content(
            model='gemini-2.5-pro',
            contents=prompt
        )
        individual_summaries.append({'title': entry.title, 'summary': response.text.strip()})

    return individual_summaries

def summarize_entries(entries, api_key):
    """Summarizes each RSS entry individually in English."""
    client = genai.Client(api_key=api_key)
    individual_summaries = []

    for entry in entries:
        content = entry.get('content', [{}])[0].get('value', entry.get('summary', ''))
        prompt = f"""
        Your role is a professional podcast host writing a script for an English-language audio briefing on Artificial Intelligence.
        Your task is to summarize the following article.

        Guidelines for the summary:
        - Write in a natural, conversational, and engaging podcast style.
        - The output must be a clean paragraph of plain text.
        - It must be suitable for direct text-to-speech conversion.

        **CRITICAL INSTRUCTIONS:**
        - **DO NOT** use any Markdown formatting.
        - **DO NOT** begin with conversational filler like "Of course, here is a summary...".
        - **DO NOT** announce what you are doing. Just provide the summary directly.

        Article to summarize:
        Title: {entry.title}
        Content: {content}
        """
        response = client.models.generate_content(
            model='gemini-2.5-pro',
            contents=prompt
        )
        individual_summaries.append({'title': entry.title, 'summary': response.text.strip()})

    return individual_summaries

Step 3: Generating and Stitching the Audio

The final step is to turn the script into a polished audio file. The application iterates through the generated summaries, calling Google's Text-to-Speech API to create an audio segment for each one. It then uses the pydub library to stitch these segments together with a pre-recorded intro and outro, creating a single, high-quality MP3 file that is uploaded to Cloud Storage.

def generate_and_upload_stitched_audio(summaries, bucket_name):
    """
    Generates audio for each summary, stitches them together, and uploads to GCS.
    """
    try:
        tts_client = texttospeech.TextToSpeechClient()
        voice = texttospeech.VoiceSelectionParams(language_code="en-US", name="en-US-Studio-O")
        audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)

        all_audio_segments = []

        # Generate and add the intro
        intro_text = "Good morning, and welcome to your AI briefing. Here is the latest news."
        intro_segment = text_to_audio_segment(intro_text, tts_client, ...)
        if intro_segment:
            all_audio_segments.append(intro_segment)

        # Loop through summaries, creating audio for the title and content
        for summary in summaries:
            title_text = f"The next story is titled: {summary['title']}."
            title_segment = text_to_audio_segment(title_text, tts_client, ...)
            if title_segment:
                all_audio_segments.append(title_segment)

            summary_segment = text_to_audio_segment(summary['summary'], tts_client, ...)
            if summary_segment:
                all_audio_segments.append(summary_segment)
        # Generate and add the outro
        outro_text = "And that's all for your briefing today. Thanks for listening."
        outro_segment = text_to_audio_segment(outro_text, tts_client, ...)
        if outro_segment:
            all_audio_segments.append(outro_segment)
        # Stitch everything together using pydub and upload
        final_audio = sum(all_audio_segments)

        # ... export and upload to GCS ...

        return "gs://<your-bucket-name>/summary-YYYY-MM-DD.mp3"

    except Exception as e:
        # ... error handling ...
        return None

def generate_and_upload_stitched_audio(summaries, bucket_name):
    """
    Generates audio for each summary, stitches them together, and uploads to GCS.
    """
    try:
        tts_client = texttospeech.TextToSpeechClient()
        voice = texttospeech.VoiceSelectionParams(language_code="en-US", name="en-US-Studio-O")
        audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)

        all_audio_segments = []

        # Generate and add the intro
        intro_text = "Good morning, and welcome to your AI briefing. Here is the latest news."
        intro_segment = text_to_audio_segment(intro_text, tts_client, ...)
        if intro_segment:
            all_audio_segments.append(intro_segment)

        # Loop through summaries, creating audio for the title and content
        for summary in summaries:
            title_text = f"The next story is titled: {summary['title']}."
            title_segment = text_to_audio_segment(title_text, tts_client, ...)
            if title_segment:
                all_audio_segments.append(title_segment)

            summary_segment = text_to_audio_segment(summary['summary'], tts_client, ...)
            if summary_segment:
                all_audio_segments.append(summary_segment)
        # Generate and add the outro
        outro_text = "And that's all for your briefing today. Thanks for listening."
        outro_segment = text_to_audio_segment(outro_text, tts_client, ...)
        if outro_segment:
            all_audio_segments.append(outro_segment)
        # Stitch everything together using pydub and upload
        final_audio = sum(all_audio_segments)

        # ... export and upload to GCS ...

        return "gs://<your-bucket-name>/summary-YYYY-MM-DD.mp3"

    except Exception as e:
        # ... error handling ...
        return None

The Frontend: Simple Audio Delivery

The frontend is a Flask application that serves the generated podcast episodes through a simple …ehm minimalist interface.

The web application handles:

Episode Listing: Display all available daily briefings
Audio Player: HTML5 audio player for seamless listening
Clean UI: Minimal, podcast-focused interface
Mobile-Friendly: Responsive design for listening on any device

Secure Audio Delivery with Signed URLs

The application implements secure audio delivery using Google Cloud Storage signed URLs, ensuring that audio files remain private while providing time-limited access to authenticated users.

def generate_signed_url(bucket_name, object_name, expiration_minutes=1440):
    """Generate a signed URL using impersonated credentials."""
    try:
        # Get default credentials from Cloud Run metadata server
        source_credentials, project = default()

        # Create impersonated credentials for the target service account
        target_credentials = impersonated_credentials.Credentials(
            source_credentials=source_credentials,
            target_principal=SERVICE_ACCOUNT_EMAIL,
            target_scopes=['https://www.googleapis.com/auth/cloud-platform'],
        )

        # Create storage client with impersonated credentials
        storage_client = storage.Client(credentials=target_credentials)
        bucket = storage_client.bucket(bucket_name)
        blob = bucket.blob(object_name)

        # Generate signed URL (24 hours expiration)
        signed_url = blob.generate_signed_url(
            version="v4",
            expiration=timedelta(minutes=expiration_minutes),
            method="GET"
        )

        return signed_url

    except Exception as e:
        # ... error handling ...
        return None

def generate_signed_url(bucket_name, object_name, expiration_minutes=1440):
    """Generate a signed URL using impersonated credentials."""
    try:
        # Get default credentials from Cloud Run metadata server
        source_credentials, project = default()

        # Create impersonated credentials for the target service account
        target_credentials = impersonated_credentials.Credentials(
            source_credentials=source_credentials,
            target_principal=SERVICE_ACCOUNT_EMAIL,
            target_scopes=['https://www.googleapis.com/auth/cloud-platform'],
        )

        # Create storage client with impersonated credentials
        storage_client = storage.Client(credentials=target_credentials)
        bucket = storage_client.bucket(bucket_name)
        blob = bucket.blob(object_name)

        # Generate signed URL (24 hours expiration)
        signed_url = blob.generate_signed_url(
            version="v4",
            expiration=timedelta(minutes=expiration_minutes),
            method="GET"
        )

        return signed_url

    except Exception as e:
        # ... error handling ...
        return None

Episode Discovery and Rendering

The Flask application automatically discovers available episodes by scanning the Cloud Storage bucket and presents them in a chronological interface:

@app.route('/')
def index():
    """Main page showing available audio episodes."""
    try:
        storage_client = storage.Client()
        bucket = storage_client.bucket(GCS_BUCKET_NAME)

        # List all MP3 files in the bucket
        blobs = bucket.list_blobs(prefix="summary-", delimiter=".mp3")
        episodes = []

        for blob in blobs:
            if blob.name.endswith('.mp3'):
                # Extract date from filename (summary-YYYY-MM-DD.mp3)
                date_str = blob.name.replace('summary-', '').replace('.mp3', '')

                # Generate secure signed URL for audio access
                signed_url = generate_signed_url(GCS_BUCKET_NAME, blob.name)

                if signed_url:
                    episodes.append({
                        'filename': blob.name,
                        'date': date_str,
                        'signed_url': signed_url,
                        'created': blob.time_created
                    })

        # Sort episodes by date (newest first)
        episodes.sort(key=lambda x: x['created'], reverse=True)

        return render_template('index.html', episodes=episodes)

    except Exception as e:
        # ... error handling ...
        return render_template('error.html')

@app.route('/')
def index():
    """Main page showing available audio episodes."""
    try:
        storage_client = storage.Client()
        bucket = storage_client.bucket(GCS_BUCKET_NAME)

        # List all MP3 files in the bucket
        blobs = bucket.list_blobs(prefix="summary-", delimiter=".mp3")
        episodes = []

        for blob in blobs:
            if blob.name.endswith('.mp3'):
                # Extract date from filename (summary-YYYY-MM-DD.mp3)
                date_str = blob.name.replace('summary-', '').replace('.mp3', '')

                # Generate secure signed URL for audio access
                signed_url = generate_signed_url(GCS_BUCKET_NAME, blob.name)

                if signed_url:
                    episodes.append({
                        'filename': blob.name,
                        'date': date_str,
                        'signed_url': signed_url,
                        'created': blob.time_created
                    })

        # Sort episodes by date (newest first)
        episodes.sort(key=lambda x: x['created'], reverse=True)

        return render_template('index.html', episodes=episodes)

    except Exception as e:
        # ... error handling ...
        return render_template('error.html')

Security Features:

The application includes these built-in security capabilities:

Private Storage: Audio files stored securely with signed URL access
IAP Integration: Automatic Identity-Aware Proxy configuration for controlled access
Configurable Access: Email-based user authentication through Google accounts
Zero-Config Security: IAP automatically enabled when user emails are specified in Terraform
Time-Limited URLs: Signed URLs expire after 24 hours for additional security

Conclusion

This project demonstrates an example of combining AI models with serverless cloud architecture. By leveraging Gemini's natural language understanding and Google Cloud's Text-to-Speech capabilities, we've created a system that automatically produces high-quality podcast content.

Check the GitHub Repo: https://github.com/ggalloro/ai-news to deploy it in your Google Cloud environment, change the RSS feeds to summarize content based on your interests, or start from existing code to build something new and more useful for you !

Contents