June 12, 2014 - rob

youtube video download script

 

 

#!/usr/bin/perl-T

use strict;
use warnings;

#
##  Calomel.org  ,:,  DownloadYoutube videos and music usingwget
##    Script Name : youtube_wget_video.pl
##    Version     : 0.42
##    Validfrom  : March 2014
##    URL Page    :https://calomel.org/youtube_wget.html
##    OS Support  : Linux, MacOSX, OpenBSD, FreeBSD or any system withperl
#                `:`
## Two arguments
##    $1Youtube URL from the browser
##    $2 prefix to the file name of the video (optional)
#

############  options  ##########################################

# Option: what file type do you want to download? The string is used to search
# in theyoutube URL so you can choosemp4,webm,avi orflv.mp4 seems to
# work on the most players like android,ipod,ipad,iphones,vlc andmplayer.
my $fileType = "mp4";

# Option: what visual resolution or quality do you want to download? List
# multiple values just in case the highest quality video is not available, the
# script will look for the next resolution. You can choose "highres" for 4k,
# "hd1080" for1080p, "hd720" for 720p, "itag=18" which means standard
# definition 640x380 and "itag=17" which is mobile resolution144p (176x144).
# The script will always prefer to download the highest resolution video format
# from the list if available.
my $resolution = "hd720,itag=18";

# Option: How many times should the script retry the download ifwget fails for
# any reason? Do not make this too high as areoccurring error will just hit
#youtube over and over again. 
my $retryTimes = 20;

# Option: do you want the resolution of the video in the file name? zero(0) is
# no and one(1) is yes. This option simply puts "_hd1080.mp4" or similar at the
# end of the file name.
my $resolutionFilename = 0;

# Option: Force all communication with YouTube to use SSL (https) links. The
# script will simply convert all URL's you pass to the script to usehttps
# instead of http. Encryption better protects your privacy and may help avoid
# ISP rate limiting. 
my $forceSSL = 1;

# Option: turn on DEBUG mode. Use this to reverse engineering this code if you are
# making changes or you are building your ownyoutube download script.
my $DEBUG=0;

#################################################################

# initialize retry loop and resolution variables
$ENV{PATH} = "/bin:/usr/bin:/usr/local/bin";
my $prefix = "";
my $retry = 1;
my $retryCounter = 0;
my $resFile = "unknown";
my $user_url = "";
my $user_prefix = "";

# collect the URL from the command line argument
chomp($user_url = $ARGV[0]);
my $url = "$1" if ($user_url =~ m/^([a-zA-Z0-9\_\-\&\?\=\:\.\/]+)$/ or die "\nError: Illegal characters in YouTube URL\n\n" );

# declare the user defined file name prefix if specified
if (defined($ARGV[1])) {
   chomp($user_prefix = $ARGV[1]);
   $prefix = "$1" if ($user_prefix =~ m/^([a-zA-Z0-9\_\-\.\ ]+)$/ or die "\nError: Illegal characters in filename prefix\n\n" );
}

# while loop to retry downloading the video if the script fails for any reason
while ( $retry != 0 && $retryCounter < $retryTimes ) {

# Force SSL (https) download of the html page
$url =~ s/http:\/\//https:\/\//gi if ($forceSSL == 1);

# download the html from theyoutube page containing the page title and video
#url. The page title will be used for the local video file name and theurl
# will be sanitized and passed towget for the download.
my $html = `wget-4Ncq --convert-links=off --no-cookies --timeout=20 --user-agent='' --no-check-certificate "$url"-O-`  or die  "\nThere was a problem downloading the HTML page.\n\n";

# format the title of the page to use as the file name
my ($title) = $html =~ m/

(.+)<\/title>/si;<br /> $title =~ s/[^\w\d]+/_/g or die “\nError: we could not find the title of the HTML page. Check the URL.\n\n”;<br /> $title =~ s/_youtube//ig;<br /> $title =~ s/^_//ig;<br /> $title = lc ($title);<br /> $title =~ s/_amp//ig;</p> <p># filter the URL of the video from the HTML page<br /> my ($download) = $html =~ /”url_encoded_fmt_stream_map”(.*)/ig;</p> <p># Print all of the separated strings in the HTML page<br /> #print “\n$download\n\n” if ($DEBUG == 1);</p> <p># This is where we look through the HTML code and select the file type and<br /> # video quality.<br /> my @urls = split(‘,’, $download);<br /> OUTERLOOP:<br /> foreach my $val (@urls) {<br /> # print “\n$val\n\n”;</p> <p> if ( $val =~ /$fileType/ ) {<br /> my @res = split(‘,’, $resolution);<br /> foreach my $ress (@res) {<br /> if ( $val =~ /$ress/ ) {<br /> print “\n html to url seperation complete.\n\n” if ($DEBUG == 1);<br /> print “$val\n” if ($DEBUG == 1);<br /> $resFile = $ress;<br /> $resFile = “sd640” if ( $ress =~ /itag=18/ );<br /> $resFile = “mobil176” if ( $ress =~ /itag=17/ );<br /> $download = $val;<br /> last OUTERLOOP;<br /> }<br /> }<br /> }<br /> }</p> <p># clean up the url by translating unicode and removing unwanted strings<br /> print “\n Re-formatting url for wget…\n\n” if ($DEBUG == 1);<br /> $download =~ s/\:\ \”//;<br /> $download =~ s/%3A/:/g;<br /> $download =~ s/%2F/\//g;<br /> $download =~ s/%3F/\?/g;<br /> $download =~ s/%3D/\=/g;<br /> $download =~ s/%252C/%2C/g;<br /> $download =~ s/%26/\&/g;<br /> $download =~ s/sig=/signature=/g;<br /> $download =~ s/\\u0026/\&/g;<br /> $download =~ s/(type=[^&]+)//g;<br /> $download =~ s/(fallback_host=[^&]+)//g;<br /> $download =~ s/(quality=[^&]+)//g;</p> <p># clean up the url<br /> my ($youtubeurl) = $download =~ /(http?:.+)/;</p> <p># url title additon<br /> my ($titleurl) = $html =~ m/<title>(.+)<\/title>/si;<br /> $titleurl =~ s/ – YouTube//ig;<br /> $titleurl =~ s/ /%20/ig;</p> <p># combine the youtube url and title string<br /> $download = “$youtubeurl\&title=$titleurl”;</p> <p># a bit more cleanup as youtube<br /> #$download =~ s/&+/&/g;<br /> #$download =~ s/&itag=\d+&signature=/&signature=/g;</p> <p># combine file variables into the full file name<br /> my $filename = “unknown”;<br /> if ( $resolutionFilename == 1 ) {<br /> $filename = “$prefix$title\_$resFile.$fileType”;<br /> } else {<br /> $filename = “$prefix$title.$fileType”;<br /> }</p> <p># Process check: Are we currently downloading this exact same video? Two of the<br /> # same wget processes will overwrite themselves and corrupt the video.<br /> my $running = `ps auwww | grep [w]get | grep -c “$filename”`;<br /> print “\n Is the same file already being downloaded? $running\n” if ($DEBUG == 1);<br /> if ($running >= 1)<br /> {<br /> print “\n Already $running process, exiting.” if ($DEBUG == 1);<br /> exit 0;<br /> };</p> <p># Force SSL (https) download of the video file.<br /> $download =~ s/http:\/\//https:\/\//g if ($forceSSL == 1);</p> <p># Print the long, sanitized youtube url for testing and debugging<br /> print “\n The following url will be passed to wget:\n\n” if ($DEBUG == 1);<br /> print “\n$download\n” if ($DEBUG == 1);</p> <p># print the file name of the video being downloaded for the user<br /> print “\n Download: $filename\n\n” if ($retryCounter < 1);</p> <p># Background the script before wget starts downloading. Use “ps” if you need to<br /> # look for the process running or use “ls -al” to look at the file size and<br /> # date.<br /> fork and exit;</p> <p># Download the video<br /> system(“wget”, “-4Ncq”, “–convert-links=off”, “–no-cookies”, “–timeout=20”, “–no-check-certificate”, “–user-agent=”” , “$download”, “-O”, “$filename”);</p> <p># Print the error code of wget<br /> print “\n wget error code: $?\n” if ($DEBUG == 1);</p> <p># Exit Status: Check if the file exists and we received the correct error code<br /> # from wget system call. If the download experienced any problems the script<br /> # will run again and try continue the download until the retryTimes count limit<br /> # is reached.</p> <p>if( $? == 0 && -e “$filename” && ! -z “$filename” )<br /> {<br /> print “\n Finished: $filename\n\n” if ($DEBUG == 1);<br /> # print “\n Success: $filename\n\n”;<br /> $retry = 0;<br /> }<br /> else<br /> {<br /> print STDERR “\n FAILED: $filename\n\n” if ($DEBUG == 1);<br /> # print “\n FAILED: $filename\n\n”;<br /> $retry = 1;<br /> $retryCounter++;<br /> # sleep $retryCounter;<br /> sleep 1;<br /> }<br /> }</p> <p>#### EOF #####

How do I use the script ?

Once you have the script setup you just need to find a Youtube video. We chose a video from Tobygames as he meets his first Giant Radscorpion in Fallout New Vegas. Execute the script with the youtube URL copy and pasted from Firefox’s URL bar. Make note you can add one more argument to the end of the command line to add a prefix to the file name. Here is an example of both options; notice the change in files names as the second example has “toby_” as the file name prefix. Also note some of the URLS through youtube have ampersands “&” in them. For these types of URL’s just use double quotes around the url so your shell passes the full string into the script.

## Example 1: Here we just pass the youtube URL
#
user@machine$ ./youtube_wget.pl "https://www.youtube.com/watch?v=ejkm5uGoxs4"

   Download: radscorpion.mp4


## Example 2: Here we pass the Youtube URL and the file name prefix "toby_"
#
user@machine$ ./youtube_wget.pl "https://www.youtube.com/watch?v=ejkm5uGoxs4" toby_

   Download: toby_radscorpion.mp4

The video will download in the background and save to your current directory. You can play it with your favorite video player, we prefer VLC for example.

Things to keep in mind…

File name is the same as the name of the web page: Notice the file name is the same as the title of the Youtube web page. We have also scrubbed the title to take out all special characters and reduce all letters to lower case. This makes it easier read and to run on the command line.

Save Location: The video will be saved in your current directory.

Script methodology: The wget line will run in the background. You can start as many of these downloads as you want. We have started as many as a dozen simultaneous downloads without issue. The script will finish silently; meaning when the download is finished you will not get any notification.

Wget process state: You can check if the download is running by looking at the process list (ps) and grep’ing for wget. Something like, ps -aux | grep wget will work. At this point there is no way to tell how fast the download is going. What you can do is look at the file size change using ls -la and estimate from there. You can start watching the video file right away too. The file is downloaded serially, so as soon as the file starts downloading you should be able to start VLC if want to watch the video right away.

Video file type: The video will download in the file type you specify. mp4 seems to be the most compatible type, but WebM which is also called VP8 and AVI are available. WebM is a digital multimedia container file format promoted by the open-source WebM Project headed by Google. It comprises a subset of the Matroska multimedia container format. If you current media player does not support webm then you need a codec for your OS. Just search on Google for “webm codec” and you should get pointed in the right direction. Note, you can play this format with the VLC media player which is available on all OS’s. VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVD, Audio CD, VCD, and various streaming protocols. VLC can also play the videos at greater than 1x speed by hitting the plus “+” key on the keypad. When playing videos at anything faster than 1x the sound will be automatically pitch corrected. For example, we like to watch Quill18, Day9 and Sacriel42 videos at 2x.

Always use the latest script version: Youtube changes the format of their HTML pages every once in a while which consequently breaks download scripts like what we have here. The average amount of time between HTML format changes is three(3) months. If you find this script no longer works make sure to check back on this page for any updates. We will do our best to keep this command line option working since we use this script at least once a day. Make note, at the top of the script we have the version number and date the script is good from. We will also post on the RSS feed (link at the top of the page) when a new version is available.

How do I make an audio mp3 or ogg from a youtube video ?

At some point you will want to save off the sound from a video. A good case is downloading an instructional video and listening to it on your music player like a Sansa Clip, iPad, iPhone or iPod. We like to download class videos from the Massachusetts Institute of Technology (MIT) and listen to them in the car.

You can use the above download script to get a video and convert the video’s soundtrack to MP3 (or OGG or any other) format using avconv. You will need to install avconv in order to extract the audio. For Ubuntu use “apt-get install libav-tools”. OpenBSD “pkg_add -i ffmpeg” and FreeBSD use “pkg_add -r ffmpeg” instead if avconv is not available as a package. For Mac OSX you may want to look at ffmpegX.

## Convert the audio from a Youtube video to mp3 or ogg, audio only.

## download the video. (Same link to Tobygames as above)
user@machine$ ./youtube_wget.pl https://www.youtube.com/watch?v=ejkm5uGoxs4
  Downloading:  radscorpion.flv

## Convert video to mp3, audio only
user@machine$ avconv -i radscorpion.flv -vn -ab 128 toby_audio.mp3

## Convert video to ogg, audio only
user@machine$ avconv -i radscorpion.flv -vn -ab 128 toby_audio.ogg

 

 

https://calomel.org/youtube_wget.html

Linux