Whisper is build on top of pytorch, that usually work better with NVIDIA cards, so for now the application require a decent Nvidia card to run.

About the application:

You can select multiple audio/videos from your computer and generate subtitle for it. It accept multiple languages as input, you can select the language using the GUI.

There is also the option to translate the subtitle to English if you like.

This is still a work in progress, more options will be available in the future as more updates are complete.


  • After extracting the .zip, open the "Whisper GUI.exe" inside the folder
  • There are multiple models to select, it will download them if you don't have it already.
  • If you have more than 10Vram on your card, you will always want to use Large-V2
  • If not, use the larger model you can. If your input is in english, use the ".en" version.


English transcribe:

Japanese -> English Tranlation

model medium and large-2 are downloaded, but not loaded, the download just stops at the end (100%) and so on in a circle - a constant download from the Internet. Pt files are downloaded to the cache folder and then nothing happens. They are overwritten again.

The models should be downloaded into (sub)folder where the exe is, not left on the system in some user/cache folder...

Thanks for making this Whisper GUI. I appreciate that it can be used offline. I hope you will continue to improve its functionality, not that it doesn't function, but perhaps to add more bells and whistles. My laptop (NVIDIA Geforce RTX 3060) can only utilize the small model -can you recommend a more powerful laptop to allow me to use the larger models? Is there a way to compensate you for your incredible work? 

Unfortunately, this program is useless for me, because I want a Hungarian language course!

So I still have to use YouTube's service, which works uncertainly and gives dubious results.


I want better so much!

Does someone know if this GUI is usable sans GPU?  I know the model can be used without a GPU but I don't know if the settings in the GUI allow it.  I am a mac user but i'm trying to find an easy way for a non-techy windows using friend to use whisper.  I know it's gonna be slow but can it work?

If we can have the CPU support?

I would like to add the following feature requests:

  1. Light mode
  2. Change "Input Video(s)" to Input Media as it would include audio and video. This really boring rename is the kind of thing companies are good at. Quantity (NA), Quality, Presentation. Once a program does what it is supposed to do, it can focus on Presentation.

does nothing

it will be so cool when models like whisper , also attaches meta data to each word, like tone , pitch, start and end time and recognizes different voices. So that we can feed it back into simple text to voice generator and generate new audio to dub videos. So many anime's , Korean fantasy and sci fi drama, that I would love to listen to instead of reading subtitles. It would also help with creating a star trek like communicate that lets anyone talk to others in in the same tone they intended. 

love this! thank you for making an easy to use functional GUI so it's easy to try out! 
I might just have missed it but is there a buy/donate button for this to toss a couple bucks over in appreciation? 

this might be beyond the scope of a GUI but is there any chance of having it be able to do live subtitles?

also i noticed that it seems to pause when i click off the window, I assume this is intended?


awesome GUI! I tried a few others and had issues, but this one worked.
Progress bar/time left would definitely be helpful.
Also since I had whisper already in python, maybe you could do a version that checks the python cache folder for models instead of downloading all of them again?
Anyway thank you so much for making this!


Would it be possible to add a progress bar for processing?


Yeah, once I mess a little more with the code.

awesome. Thanks for the great software.

If possible, another feature req: -allow for custom models (ex: some of the custom trained ones on hugging face)

Does it work with Live fead?

Not right now, perhaps in a future update.

Thanks for the reply,
It is realy nice little app. 

As of now detect language or chose language and translate by default to english. what about adding target language for translation? (not just english)

Other languages as target are not supported by Whisper, but I plan to add a another model for that.


Amazing!!!!! Cant wait for itIcan be on your beta testing if you like ;-)

Wonderful tool! Works very well with japanese!