ATVFiles Movie/TV Categorization Script
From Bubba.org
Contents |
Note: This project has moved
Please note, this project is now hosted at http://code.google.com/p/atv2xml/ - the information below will not be updated for new releases, but remains for those who who fail to see this blurb.
Introduction
This script will take a video file of a Movie or TV show and attempt to download information about such file and put the information into the ATVFiles XML Metadata for use on the AppleTV.
Usage
I call the script from a sabnzbd user script so that it is run once a file has been downloaded.
usage: atv2xml.pl [-movie|-tvshow] [-altimg] [-duration] [-overwrite] [-usedir] [-usefirst] path-to-search-for-files
-altimg try to get a larger image via movieposterdb images
-duration normally leaves the duration string as 0 so that ATVFiles
will compute this itself. using this will grab duration
from IMDB (only applies to movies)
-overwrite will overwrite existing .xml files (you will have the
option to skip files you don't want to overwrite).
-usedir will use the parent directory name instead of the filename
to classify the file (only applies to movies).
-usefirst will use the first match (#0) returned from the search
and assume that we have a successful match. Useful
for unattended running. Your names must be accurate
for this option to be useful.
-tvshow|-movie lookup either via TheTVDB or IMDB (-movie is default if
nothing is specified).
Script
#!/usr/bin/perl -w # # 2009/10/03 - bubbaATbubba.org # - Another fix from Tony for the movieposterdb image quality # # 2009/04/17 - bubbaATbubba.org # - added a fixup routine to auto-rename shows based on the # filename to a more appropriate name to get a better match, # tweaked guess regex for episode names, and if episodes are # named different # # 2008/12/23 - bubbaATbubba.org # - changed TV show parsing to account for titles with .'s instead # of spaces # # 2008/10/27 - bubbaATbubba.org # - moved from Date::Parse to Date::Manip # - fixed various issues with TV gathering where not all episodes # had all information (missing description, missing air date) # # 2008/09/14 - bubbaATbubba.org # - moved everything to TheTVDB API (thanks again Tony!) # - tweaked tv name guessing section a bit # - added episode # before the title for TV shows # - other minor fixes # # 2008/09/12 - bubbaATbubba.org # - fixed another issue with movieposterdb. Hopefully this # fix will keep things working a bit longer than before. # (Thanks Tony!) # # 2008/09/07 - bubbaATbubba.org # - fixed an issue with movieposterdb # # 2008/07/19 - bubbaATbubba.org # - fixed issue with TheTVDB poster gathering. # # 2008/05/20 - bubbaATbubba.org # - not using tv.com perl module any longer, instead wrote # functions to query TheTVDB (they rock). Issues with # computing episode numbers are gone. You may also # use the show selector if more than one result is # returned, unlike before. # - added ability to attempt to retrieve an alternate # movie images from movieposterdb. # - fixed a few regex's here and there. # # 2008/05/18 - bubbaATbubba.org # - added ability to get TV show info and images (-tvshow) # At this time this depends on the name being correct and # the file name being: SHOW - SXXEXX.avi # Also uses some rudamentry computation of complete episode # # (see comments). This should really use thetvdb but I'm too # lazy to write a perl module for them. # # 2008/05/14 - bubbaATbubba.org # - fixed ratings (G/PG) # - Added Star Ratings though ATVfiles doesn't seem to care :( # Anyone know how to change what ATVfiles displays? # - added the -usefirst option for unattended selection of the first # hit in a search (#0). You need to make sure your file/dir names # are accurate for this to be useful. # # 2008/05/12 - bubbaATbubba.org # - Argument can now be a directory tree with files to classify. You must # configure $extensions to specify which files to look for # - Added ability to use the parent directory name instead of filename to # classify the movie to lookup (-usedir option) # - Added the ability to overwrite and/or skip overwriting during # classification (-overwrite option). There is now an option to "S" skip # in the menu should you not wish to overwrite the current xml file. # - Fixed -duration option (wasn't working for times like "US: 120 MIN") # # Thats to nrh on AwkwardTV forums for starting this... :) use strict; use IMDB::Film; use HTML::Template; use Date::Manip; use File::Basename; use File::stat; use Fcntl; use utf8; use Getopt::Long; use LWP::Simple; use XML::TreePP; use File::Find; use Data::Dumper; # put file extensions in here you want to look for. # make sure to escape your "." and put a | in betweeen # your extensions my $extensions = qw'\.avi|\.wmv|\.mp4|\.mkv'; # if image size is less than this many bytes, # try getting a large image from google images # (requires -altimg option to even try this) my $filesize = "30000"; # now required for TVDB calls... :( my $tvdb_key = ""; my ($template_ref, $dir, $searchTerm, $imdb, $overwrite, $usedir, $xml, $cover, $content, $usefirst, $tvshow, $movie, $season, $episode, $show, $show_name, $episode_num, %season_hash, $altimg); my $use_duration = 0; &usage unless GetOptions("duration" => \$use_duration, "overwrite" => \$overwrite, "usedir" => \$usedir, "usefirst" => \$usefirst, "movie" => \$movie, "tvshow" => \$tvshow, "altimg"=> \$altimg); my %fixup = ("The Office" => "The Office (US)"); # do movie searches by default if (!$tvshow) { $movie = 1; } # autoflush; $|=1; # unicode support binmode(STDOUT, ":utf8"); &usage unless $ARGV[0]; # define the output template here my $movie_template =<<MXML; <media type="Movie"> <title><TMPL_VAR NAME=TITLE></title> <summary><TMPL_VAR NAME=SUMMARY></summary> <rating><TMPL_VAR NAME=CERTIFICATION></rating> <starRating><TMPL_VAR NAME=RATING></starRating> <published><TMPL_VAR NAME=DATE></published> <duration><TMPL_VAR NAME=DURATION></duration> <genres><TMPL_LOOP NAME=GENRE> <TMPL_VAR NAME=NAME></TMPL_LOOP> </genres> <cast><TMPL_LOOP NAME=CAST> <name><TMPL_VAR NAME=NAME></name></TMPL_LOOP> </cast> <directors><TMPL_LOOP NAME=DIRECTORS> <name><TMPL_VAR NAME=NAME></name></TMPL_LOOP> </directors> </media> MXML my $tv_template =<<TXML; <media type="TV Show"> <title><TMPL_VAR NAME=TITLE></title> <artist><TMPL_VAR NAME=NAME></artist> <summary><TMPL_VAR NAME=SUMMARY></summary> <description><TMPL_VAR NAME=DESCRIPTION></description> <published><TMPL_VAR NAME=DATE></published> <seriesName><TMPL_VAR NAME=NAME></seriesName> <episode><TMPL_VAR NAME=EPISODE></episode> <episodeNumber><TMPL_VAR NAME=EPISODENUM></episodeNumber> <season><TMPL_VAR NAME=SEASON></season>> </media> TXML find(\&findfiles,$ARGV[0]); sub findfiles { my $file = $File::Find::name; undef $template_ref; if ($movie) { $template_ref = \$movie_template; } elsif ($tvshow) { if (!$tvdb_key) { print "You need an API key from http://www.thetvdb.com. Please". " create an account and request a key. \n"; exit 1; } $template_ref = \$tv_template; } return unless -f $file; return unless $_ =~ m/$extensions/io; return if $_ =~ /sample/i; print "FILE: $file\n"; $dir = dirname($file); $dir =~ s/.*\///; my ($xmlfile,$xmlpath,$xmlfilesuffix) = fileparse($file,qr/\.[^.]*/); $xmlfile .= ".xml" ; if (-f "$xmlpath/$xmlfile" && (!$overwrite)) { return; } elsif (-f "$xmlpath/$xmlfile") { print "\n\nFound $xmlfile.. Will overwrite.\n"; } # now we start. # derive search term from filename $searchTerm = guessTitleFromFilename($file); foreach my $fix (keys %fixup) { if ($searchTerm =~ /$fix/) { $searchTerm=$fixup{$fix}; } } $show_name = $searchTerm; undef $imdb; undef $show; # main loop if ($movie) { while ($imdb = IMDB::Film->new(crit => "$searchTerm")) { my @results = @{ $imdb->matched }; if (!@results && $imdb) { #print Dumper $imdb; $searchTerm = $imdb->id; } last unless @results > 0; # we'll assume the 1st hit is what we want... if ($usefirst) { $searchTerm = $results[0]->{id}; $show_name = $results[0]->{title}; print "Using first search result: $show_name\n"; } if (@results > 0 && !$usefirst) { my $choice = &displayMenu(@results); # undef and replace $imdb object if ($choice =~ /^[Nn]$/) { $searchTerm = &getSearchTerm; undef $imdb; } elsif ($choice =~ /^[Ss]$/) { return; } else { $searchTerm = $results[$choice]->{id}; $show_name = $results[$choice]->{title}; } } } } elsif ($tvshow) { my @results = doSeriesSearch($searchTerm); last unless @results > 0; # we'll assume the 1st hit is what we want... if ($usefirst) { $searchTerm = $results[0]->{id}; $show_name = $results[0]->{title}; print "Using first search result: $show_name\n"; } if (@results > 0 && !$usefirst) { my $choice = &displayMenu(@results); if ($choice =~ /^[Nn]$/) { $searchTerm = &getSearchTerm; } elsif ($choice =~ /^[Ss]$/) { return; } else { $searchTerm = $results[$choice]->{id}; $show_name = $results[$choice]->{title}; } } } # we got a single result here (possibly by searching on id) undef $xml; if ($movie) { $xml = imdbToTmpl($imdb); $cover = $imdb->cover(); } elsif ($tvshow) { $show = getEpiInfo($searchTerm,$episode,$season); $cover = getBanner($searchTerm,$season); $xml = showToTmpl($show); } my ($outfile,$path,$suffix) = fileparse($file,qr/\.[^.]*/); $outfile .= ".xml"; # write output file &writeXMLFile($path.$outfile, $xml) || exit 1; $content = get($cover); my ($coverfile,$coverfilepath,$coverfilesuffix) = fileparse($file,qr/\.[^.]*/); $coverfile .= ".jpg" ; if (open(OUT, ">$coverfilepath/$coverfile")) { print OUT $content ; close(OUT) ; } my $fsize=stat("$coverfilepath/$coverfile")->size; if (($fsize < $filesize) && ($altimg) && ($movie)) { # use movieposterdb for movies since IMDB is so low res my $search = "http://www.movieposterdb.com/browse/search?type=movies&query=$searchTerm"; my $content = get($search); my $id = $searchTerm; $id =~ s/^0//g; if ($content =~ m{img\ssrc="(http://www.movieposterdb.com/posters/[\/\S+\_]+/0?$id/\S+$id\_\S+.jpg)" }) { my $match = $1; $match =~ s/$id\/s_/$id\/l_/g; $match =~ s/$id\/m_/$id\/l_/g; if ($content = get($match)) { print "Got new image via MoviePosterDB: $match\n"; if (open(OUT, ">$coverfilepath/$coverfile")) { print OUT $content ; close(OUT) ; } } } } } exit 0; sub writeXMLFile { my $outfile = shift; my $xml = shift; if (open(OUT, ">$outfile")) { binmode(OUT, ":utf8"); print "Creating $outfile\n"; print OUT $xml; close(OUT); } else { print "Failed to create $outfile\n"; return undef; } return 1; } sub getSearchTerm { print "Enter new search term: "; my $term = <STDIN>; chomp($term); return $term; } sub displayMenu { my @results = @_; # present options my $i = $#results; my $maxpad = length($i); my $pad; if ($movie) { foreach my $result (reverse @{ $imdb->matched }) { my $length = length("$i"); $pad = $length >= $maxpad ? 0 : $maxpad - $length; print ' ' x $pad; print "$i. ".$result->{title}."\n"; $i--; } } elsif ($tvshow) { foreach my $result (reverse @results) { my $length = length("$i"); $pad = $length >= $maxpad ? 0 : $maxpad - $length; print ' ' x $pad; print "$i. ".$result->{title}."\n"; $i--; } } $pad = $maxpad - 1; print ' ' x $pad; print "N. enter a new search term\n"; print ' ' x $pad; print "S. Skip this title\n"; print "Got $#results results; use? [0]: "; my $choice = <STDIN>; chomp($choice); $choice = 0 if ($choice =~ m/^\s*$/); return $choice; } sub showToTmpl { my $show = shift; my $t; my $tmpl = HTML::Template->new( scalarref => $template_ref, die_on_bad_params => 0, ); $t = $episode . ". " . $show->{'Name'}; $tmpl->param(TITLE => $t); $tmpl->param(SUMMARY => $show->{'Name'}); $tmpl->param(DESCRIPTION=> $show->{'Overview'}); my $date = ParseDate($show->{'FirstAired'}); $date = UnixDate("$date","%d %B %Y"); $tmpl->param(DATE => "$date"); $tmpl->param(EPISODE => $episode); $tmpl->param(SEASON => $season); $tmpl->param(NAME => $show_name); #$tmpl->param(ARTIST=> $show_name); $tmpl->param(EPISODENUM=> $episode); return $tmpl->output; } sub imdbToTmpl { my $film = shift; my $tmpl = HTML::Template->new( scalarref => $template_ref, die_on_bad_params => 0, ); $tmpl->param(TITLE => $film->title); $tmpl->param(SUMMARY => $film->plot); if ($use_duration) { my $duration = $film->duration; if ($duration =~ /(\d+)/) { $duration = $1; } else { $duration = 0; } $tmpl->param(DURATION => $duration * 60); } else { $tmpl->param(DURATION => 0); } my $cert = $film->certifications; for my $country (keys %$cert) { if ($country =~ /US/) { $tmpl->param(CERTIFICATION => $cert->{$country}); } } my $rating = $film->rating; $tmpl->param(RATING => $rating); # find earliest release date my $dates; if (defined($film->release_dates)) { foreach my $day (@{ $film->release_dates}) { if(my $date = ParseDate($day->{date})) { #$date = UnixDate("$date","%s"); $dates->{$date} = $day->{country}; } } } foreach my $utc (sort keys %$dates) { my $date = ParseDate($utc); $date = UnixDate("$date","%d %B %Y"); $tmpl->param(DATE => "$date ($dates->{$utc})"); last; } # genres my @genres = (); my $first = 1; foreach my $genre (@{ $film->genres }) { my %genre_row; my $open = $first ? '<genre primary="true">' : '<genre>'; $first = 0; $genre_row{NAME} = "$open$genre</genre>"; push(@genres, \%genre_row); } $tmpl->param(GENRE => \@genres); # cast my @cast = (); foreach my $castmember (@{ $film->cast }) { my %cast_row; $cast_row{NAME} = $castmember->{name}; push(@cast, \%cast_row); } @cast = @cast[0..4]; $tmpl->param(CAST => \@cast); # producers my @directors = (); foreach my $director (@{ $film->directors }) { my %director_row; $director_row{NAME} = $director->{name}; push(@directors, \%director_row); } @directors = @directors[0..1] if $#directors > 0; $tmpl->param(DIRECTORS => \@directors); return $tmpl->output; } sub guessTitleFromFilename { my $file = shift; $season = ""; $episode = ""; my $guess = fileparse($file); if ($usedir && $movie) { $guess = $dir; print "Using Dir $dir\n"; } if ($tvshow) { $guess =~ s/\(.*\)//g; # remove anything in ()s $guess =~ /(.*?)[\.\s\-]+[Ss]?(\d{1,2})[Ee]?(\d{1,3})/; $guess = $1; $season = $2; $episode = $3; my $season_tmp = $season . $episode; if (length($season_tmp) < 4) { # when episodes are #'erd like 101 $season_tmp =~ /(\d)(\d{1,3})/; $season = $1; $episode = $2; } $guess =~ s/\./ /g; # some shows have .'s instead of spaces $episode =~ s/^0//; $season =~ s/^0//; print "Searching TheTVDB for: $guess (Season $season, Episode $episode)\n"; } elsif ($movie) { $guess =~ s/\..{1,3}\.?.{0,3}$//; # strip off extension or sabnzbd # duplicate file/dir extension (.#) $guess =~ s/\(.*\)//g; # remove anything in ()s $guess =~ s/\[.*\]//g; # remove anything in []s $guess =~ s/[\.|\'|\"|\,]//g; # remove .,"' $guess =~ tr/A-Z/a-z/; # eh $guess =~ s/_/ /g; $guess =~ s/-\d+$//; print "Searching IMDB for: $guess\n"; } return $guess; } sub getEpiID { # returns episodeID for a given series, season, episode. my ($s,$e,$se) = @_; my $episode_url = "http://www.thetvdb.com/interfaces/GetEpisodes.php?seriesid=$s&episode=$e&season=$se"; my $content = get ($episode_url); my $xs = XML::TreePP->new(); my $ref = $xs->parse($content); foreach my $key (@{$ref->{Items}->{Item}}) { if ($key->{'id'}) { return $key->{'id'}; } } } sub doSeriesSearch { # returns an array of hashes with search results (name & ID) my $term = shift; my $series_url = "http://www.thetvdb.com/api/GetSeries.php?seriesname=$term"; my $content = get ($series_url); my $xs = XML::TreePP->new(); my $ref = $xs->parse($content); my @array; my $count = 0; # more than 1 result, we get an array, otherwise, we get a hash if (ref($ref->{Data}->{Series}) eq 'ARRAY') { foreach my $key (@{$ref->{Data}->{Series}}) { $array[$count]->{id}=$key->{'seriesid'}; $array[$count]->{title}=$key->{'SeriesName'}; $count++; } } else { $array[$count]->{id}=$ref->{Data}->{Series}->{'seriesid'}; $array[$count]->{title}=$ref->{Data}->{Series}->{'SeriesName'}; } return @array; } sub getEpiInfo { # returns hash with episode info. my ($s,$e,$se) = @_; my $episode_url = "http://www.thetvdb.com/api/$tvdb_key/series/$s/default/$se/$e/en.xml"; #my $episode_url = "http://www.thetvdb.com/interfaces/GetEpisodes.php?seriesid=$s&episode=$e&season=$se"; print "TheTVDB: $episode_url\n"; my $content = get ($episode_url); my $xs = XML::TreePP->new(); my $ref = $xs->parse($content); my %info; # going to assume we always get 1 result back, otherwise we kill a kitten #print Dumper $ref; if ($ref->{Data}->{Episode}->{'id'}) { if ($ref->{Data}->{Episode}->{'FirstAired'} !~ /^HASH/) { $info{'FirstAired'} = $ref->{Data}->{Episode}->{'FirstAired'}; } else { $info{'FirstAired'} = "now"; } if ($ref->{Data}->{Episode}->{'Overview'} !~ /^HASH/) { $info{'Overview'} = $ref->{Data}->{Episode}->{'Overview'}; } else { $info{'Overview'} = "Unknown"; } if ($ref->{Data}->{Episode}->{'EpisodeName'} !~ /^HASH/) { $info{'Name'} = $ref->{Data}->{Episode}->{'EpisodeName'}; } else { $info{'Name'} = "Unknown"; } } return \%info; } sub getBanner { # returns the url of a season-specific series image (if possible), otherwise, the most # recent series image. my ($seriesid,$season) = @_; my $banner_url = "http://www.thetvdb.com/api/$tvdb_key/series/$seriesid/banners.xml"; my $banner_loc = "http://www.thetvdb.com/banners"; my $content = get ($banner_url); my $xs = XML::TreePP->new(); my $ref = $xs->parse($content); my $season_tmp; my %valid_banners = (); foreach my $key (@{$ref->{Banners}->{Banner}}) { if ($key->{'BannerType'} eq "season" && $key->{'BannerType2'} ne "seasonwide") { $season_tmp = $key->{'Season'}; if ($season_tmp =~ /\d+/) { if ($season eq $season_tmp) { # we found a season-specific image return "$banner_loc/$key->{'BannerPath'}"; } else { # print "Type: " . $key->{'BannerType'} . " Season: " . $key->{'Season'} . " URL: $banner_loc/" . $key->{'BannerPath'} . "\n"; $valid_banners{$season_tmp}="$banner_loc/$key->{'BannerPath'}" } } } } foreach my $b (reverse sort keys %valid_banners) { # otherwise, we return the most recent season-specific image #print " B: $b : selecing $valid_banners{$b}\n"; return "$valid_banners{$b}"; } } sub usage { my $name = fileparse($0); print "usage: $name [-movie|-tvshow] [-altimg] [-duration] [-overwrite] [-usedir] [-usefirst] path-to-search-for-files\n"; print " -altimg try to get a larger image via google images\n\n"; print " -duration normally leaves the duration string as 0 so that ATVFiles\n"; print " will compute this itself. using this will grab duration \n"; print " from IMDB (only applies to movies)\n\n"; print " -overwrite will overwrite existing .xml files (you will have the\n"; print " option to skip files you don't want to overwrite).\n\n"; print " -usedir will use the parent directory name instead of the filename\n"; print " to classify the file (only applies to movies).\n\n"; print " -usefirst will use the first match (#0) returned from the search\n"; print " and assume that we have a successful match. Useful\n"; print " for unattended running. Your names must be accurate\n"; print " for this option to be useful.\n\n"; print " -tvshow|-movie lookup either via TheTVDB or IMDB (-movie is default if\n"; print " nothing is specified)."; print "\n"; print "\n"; exit 1; } 1;