Automate Everything with Text Prompt



Unit Progress: 4: Automated Content Workflow

 88%

Description

In this video, Josh demonstrates how to create fully automated video performances directly from text using tools like Otter AI, 11 Labs, and HeyGen. Viewers will learn how to generate high-quality voice clones, prototype video scripts, and produce professional-looking content with minimal effort by leveraging AI-powered voice and video generation technologies. The workflow allows content creators to transform written or spoken text into polished video presentations quickly and efficiently. By following Josh's method, users can generate multiple video iterations, edit audio precisely, and create digital avatars that replicate their voice and performance with remarkable accuracy.


Outcomes

Following are the key things you will be able to do after you watch this demo:

  1. Generate video scripts from transcribed audio using AI tools

  2. Create high-quality voice clones with consistent audio recordings

  3. Prototype video content using free and paid AI platforms

  4. Optimize voice training for digital avatars

  5. Manage content production across multiple AI environments

  6. Edit audio tracks with minimal credit consumption

  7. Develop a systematic workflow for automated video creation

  8. Replicate personal performance using digital voice technology

  9. Transform text-based content into professional video presentations

  10. Implement cost-effective strategies for video and audio generation


 

Summary

  • Creating a Fully Automated Performance from Text 0:08

    • Josh Lomelino explains the process of creating a fully automated performance directly from text, including generating audio prompts using Otter AI.

    • He describes how he brainstorms ideas while walking and exports the subtitle transcript file, SRT, to process it with AI tools like Claude or ChatGPT.

    • Josh mentions breaking up long scripts into manageable blocks of 1800 characters and generating a year's worth of content for various platforms.

    • He emphasizes the use of text, whether written manually or spoken and transcribed, to craft a video script using two primary methods.

  • Generating High-Quality Voice Clones 1:51

    • Josh discusses creating a high-quality voice clone using 11 Labs, initially finding the results artificial but later perfecting the settings.

    • He highlights the importance of using a consistent audio clip for training the voice digital double, ideally around three hours of spoken audio.

    • Josh explains the challenges of recording consistently for three hours and how he stitches together previous demo recordings to create a large audio clip.

    • He stresses the need for meticulous tracking of audio settings to ensure uniformity and avoid sudden changes in volume or tonal quality.

  • Optimizing Audio Recording for Consistency 3:36

    • Josh shares his experience of recording multiple live sessions with an audience, which infused the audio with personality and energy.

    • He explains the importance of having consistently dialed-in audio for generating a high-quality performance, as the AI listens to everything in the audio track.

    • Josh mentions the time and cost involved in using 11 Labs, which can take up to six to eight hours to analyze a voice and build a model.

    • He advises against using cheaper models, such as the multilingual version one model or turbo 2.5, and recommends upgrading to the multilingual version two model for better results.

  • Using Hey Gen for Cost-Effective Prototyping 5:35

    • Josh introduces Hey Gen as an alternative for creating generative content when 11 Labs burns through credits too quickly.

    • He explains how he trains Hey Gen on his voice by uploading a 10 to 15-minute audio clip and generates unlimited videos for free, depending on the subscription plan.

    • Josh describes the process of creating prototypes, making real-time adjustments to the script, and rendering multiple takes.

    • He mentions using his phone in split screen mode while walking to make adjustments on the fly and then copying and pasting the revised script into Hey Gen.

  • Switching Between Hey Gen and 11 Labs 7:44

    • Josh explains how he can switch the voice in Hey Gen to the high-quality production voice in 11 Labs with a click of a button.

    • He highlights the downside of using Hey Gen, which is the risk of losing all credits if there are issues with the audio track in the final video.

    • Josh prefers using the Studio tool in 11 Labs for targeted editing, which allows regenerating just portions of the audio without redoing the entire clip.

    • He mentions the benefit of being able to download the WAV file and MP3 file from the Studio tool in 11 Labs as a fail-safe.

  • Organizing Video Production Phases 9:21

    • Josh describes his workflow of treating production as two phases: the cheap, free voice phase and the final phase.

    • He explains the process of pasting the text directly into the Hey Gen editor, listening to the prototype, and resolving issues before creating a new file in Hey Gen.

    • Josh organizes his videos into two folders: a prototype folder and a final folder, for easy organization of his methods.

    • He mentions using the multilingual version two model for cost-effective throwaway tests and training his voice with Hey Gen for free prototyping.

  • Leveraging Digital Doubles for High-Quality Videos 10:34

    • Josh shares how he uses his digital doubles to replicate a performance of his voice and generate a corresponding video composite.

    • He explains how he creates a script using Otter AI during a walk, copies and pastes it into his automated workflow, and produces a high-end video with minimal effort.

    • Josh highlights the benefits of this workflow, which allows him to deliver excellence without skipping a beat, even when small inconsistencies would have derailed the process before.

    • He concludes by mentioning the next steps in the following videos, which will cover adding automated visual elements on screen behind the virtual avatar.

 

 

Previous Next






Related Resources



Keywords

AutomatedperformancetextvideoOtterAIvoicecloneEleven LabsHeyGenaudiomultilingual