Transcribe or translate audio and YouTube videos to text
Generate spoken audio from text using Edge TTS
Generate depth map from image