ATVFiles Movie/TV Categorization Script

ATVFiles Movie/TV Categorization Script

From Bubba.org

Jump to: navigation, search


Contents

Note: This project has moved

Please note, this project is now hosted at http://code.google.com/p/atv2xml/ - the information below will not be updated for new releases, but remains for those who who fail to see this blurb.

Introduction

This script will take a video file of a Movie or TV show and attempt to download information about such file and put the information into the ATVFiles XML Metadata for use on the AppleTV.

Usage

I call the script from a sabnzbd user script so that it is run once a file has been downloaded.

 usage: atv2xml.pl [-movie|-tvshow] [-altimg] [-duration] [-overwrite] [-usedir] [-usefirst] path-to-search-for-files

    -altimg       try to get a larger image via movieposterdb images

    -duration     normally leaves the duration string as 0 so that ATVFiles
                  will compute this itself.  using this will grab duration 
                  from IMDB (only applies to movies)

    -overwrite    will overwrite existing .xml files (you will have the
                  option to skip files you don't want to overwrite).

    -usedir       will use the parent directory name instead of the filename
                  to classify the file (only applies to movies).

    -usefirst     will use the first match (#0) returned from the search
                  and assume that we have a successful match.  Useful
                  for unattended running.  Your names must be accurate
                  for this option to be useful.

  -tvshow|-movie  lookup either via TheTVDB or IMDB (-movie is default if
                  nothing is specified).

Script

#!/usr/bin/perl -w
# 
# 2009/10/03 - bubbaATbubba.org
# - Another fix from Tony for the movieposterdb image quality
#
# 2009/04/17 - bubbaATbubba.org
# - added a fixup routine to auto-rename shows based on the
#   filename to a more appropriate name to get a better match,
#   tweaked guess regex for episode names, and if episodes are
#   named different
# 
# 2008/12/23 - bubbaATbubba.org
# - changed TV show parsing to account for titles with .'s instead
#   of spaces
#
# 2008/10/27 - bubbaATbubba.org
# - moved from Date::Parse to Date::Manip
# - fixed various issues with TV gathering where not all episodes
#   had all information (missing description, missing air date)
#
# 2008/09/14 - bubbaATbubba.org
# - moved everything to TheTVDB API (thanks again Tony!)
# - tweaked tv name guessing section a bit
# - added episode # before the title for TV shows 
# - other minor fixes
#
# 2008/09/12 - bubbaATbubba.org
# - fixed another issue with movieposterdb. Hopefully this
#   fix will keep things working a bit longer than before. 
#   (Thanks Tony!) 
# 
# 2008/09/07 - bubbaATbubba.org
# - fixed an issue with movieposterdb
#
# 2008/07/19 - bubbaATbubba.org
# - fixed issue with TheTVDB poster gathering.
#
# 2008/05/20 - bubbaATbubba.org
# - not using tv.com perl module any longer, instead wrote
#   functions to query TheTVDB (they rock).  Issues with
#   computing episode numbers are gone.  You may also
#   use the show selector if more than one result is
#   returned, unlike before.
# - added ability to attempt to retrieve an alternate
#   movie images from movieposterdb. 
# - fixed a few regex's here and there.
#
# 2008/05/18 - bubbaATbubba.org
# - added ability to get TV show info and images (-tvshow)
#   At this time this depends on the name being correct and
#   the file name being:  SHOW - SXXEXX.avi 
#   Also uses some rudamentry computation of complete episode #
#   (see comments). This should really use thetvdb but I'm too 
#   lazy to write a perl module for them.
#
# 2008/05/14 - bubbaATbubba.org 
# - fixed ratings (G/PG)
# - Added Star Ratings though ATVfiles doesn't seem to  care :(
#   Anyone know how to change what ATVfiles displays?
# - added the -usefirst option for unattended selection of the first
#   hit in a search (#0).  You need to make sure your file/dir names
#   are accurate for this to be useful.
# 
# 2008/05/12 - bubbaATbubba.org 
# - Argument can now be a directory tree with files to classify.  You must
#   configure $extensions to specify which files to look for
# - Added ability to use the parent directory name instead of filename to 
#   classify the movie to lookup (-usedir option) 
# - Added the ability to overwrite and/or skip overwriting during 
#   classification (-overwrite option).   There is now an option to "S" skip
#   in the menu should you not wish to overwrite the current xml file.
# - Fixed -duration option (wasn't working for times like "US: 120 MIN")
#
# Thats to nrh on AwkwardTV forums for starting this... :)
 
use strict;
use IMDB::Film;
use HTML::Template;
use Date::Manip;
use File::Basename;
use File::stat;
use Fcntl;
use utf8;
use Getopt::Long;
use LWP::Simple;
use XML::TreePP;
use File::Find;
use Data::Dumper;
 
# put file extensions in here you want to look for.  
# make sure to escape your "." and put a | in betweeen
# your extensions
my $extensions = qw'\.avi|\.wmv|\.mp4|\.mkv';
 
# if image size is less than this many bytes,
# try getting a large image from google images
# (requires -altimg option to even try this)
my $filesize = "30000";
 
# now required for TVDB calls... :(
my $tvdb_key = "";
 
my ($template_ref, $dir, $searchTerm, $imdb, $overwrite, $usedir, $xml, $cover,
$content, $usefirst, $tvshow, $movie, $season, $episode, $show, $show_name,
$episode_num, %season_hash, $altimg);
 
my $use_duration = 0;
&usage unless GetOptions("duration" => \$use_duration, "overwrite" =>
\$overwrite, "usedir" => \$usedir, "usefirst" => \$usefirst,
"movie" => \$movie, "tvshow" => \$tvshow, "altimg"=> \$altimg);
 
my %fixup = ("The Office" => "The Office (US)");
 
# do movie searches by default
if (!$tvshow)  {
	$movie = 1;
}
 
 
# autoflush;
$|=1;
# unicode support
binmode(STDOUT, ":utf8");
 
&usage unless $ARGV[0];
 
# define the output template here
my $movie_template =<<MXML;
<media type="Movie">
   <title><TMPL_VAR NAME=TITLE></title>
   <summary><TMPL_VAR NAME=SUMMARY></summary>
   <rating><TMPL_VAR NAME=CERTIFICATION></rating>
   <starRating><TMPL_VAR NAME=RATING></starRating>
   <published><TMPL_VAR NAME=DATE></published>
   <duration><TMPL_VAR NAME=DURATION></duration>
 
   <genres><TMPL_LOOP NAME=GENRE>
     <TMPL_VAR NAME=NAME></TMPL_LOOP>
   </genres>
 
   <cast><TMPL_LOOP NAME=CAST>
     <name><TMPL_VAR NAME=NAME></name></TMPL_LOOP>
   </cast>
 
   <directors><TMPL_LOOP NAME=DIRECTORS>
     <name><TMPL_VAR NAME=NAME></name></TMPL_LOOP>
   </directors>
</media>
MXML
 
my $tv_template =<<TXML;
<media type="TV Show">
   <title><TMPL_VAR NAME=TITLE></title>
   <artist><TMPL_VAR NAME=NAME></artist>
   <summary><TMPL_VAR NAME=SUMMARY></summary>
   <description><TMPL_VAR NAME=DESCRIPTION></description>
   <published><TMPL_VAR NAME=DATE></published>
   <seriesName><TMPL_VAR NAME=NAME></seriesName>
   <episode><TMPL_VAR NAME=EPISODE></episode>
   <episodeNumber><TMPL_VAR NAME=EPISODENUM></episodeNumber>
   <season><TMPL_VAR NAME=SEASON></season>>
</media>
TXML
 
find(\&findfiles,$ARGV[0]);
 
sub findfiles {                       
  my $file = $File::Find::name;      
 
  undef $template_ref;
  if ($movie) {
  	$template_ref = \$movie_template;
  } elsif ($tvshow) {
	if (!$tvdb_key) {
           print "You need an API key from http://www.thetvdb.com. Please".
                 " create an account and request a key. \n";
           exit 1;
        }
  	$template_ref = \$tv_template;
  }
  return unless -f $file;            
  return unless $_ =~ m/$extensions/io;  
  return if $_ =~ /sample/i;
  print "FILE: $file\n";
 
   $dir = dirname($file);
   $dir =~ s/.*\///;
 
   my ($xmlfile,$xmlpath,$xmlfilesuffix) = fileparse($file,qr/\.[^.]*/);
 
   $xmlfile .= ".xml" ;
   if (-f "$xmlpath/$xmlfile" && (!$overwrite)) { 
	return;
   } elsif (-f "$xmlpath/$xmlfile") {
	print "\n\nFound $xmlfile.. Will overwrite.\n";
   }
 
 
# now we start.
# derive search term from filename
$searchTerm = guessTitleFromFilename($file);
 
foreach my $fix (keys %fixup) {
	if ($searchTerm =~ /$fix/) {
		$searchTerm=$fixup{$fix};
	}
}
$show_name = $searchTerm;
 
undef $imdb;
undef $show;
 
# main loop
if ($movie) {
 while ($imdb = IMDB::Film->new(crit => "$searchTerm")) {
  my @results = @{ $imdb->matched };
  if (!@results && $imdb) {
	#print Dumper $imdb;
	$searchTerm = $imdb->id;
  }
  last unless @results > 0;
 
  # we'll assume the 1st hit is what we want...
  if ($usefirst) {
	$searchTerm = $results[0]->{id};
	$show_name = $results[0]->{title};
	print "Using first search result: $show_name\n";
  }
 
  if (@results > 0 && !$usefirst) {
    my $choice = &displayMenu(@results);
    # undef and replace $imdb object
    if ($choice =~ /^[Nn]$/) {
      $searchTerm = &getSearchTerm;
      undef $imdb;
    }  elsif ($choice =~ /^[Ss]$/) {
       return;
    } else {
      $searchTerm = $results[$choice]->{id};
      $show_name = $results[$choice]->{title};
    }
  }
 }
} elsif ($tvshow)  {
  my @results = doSeriesSearch($searchTerm);
  last unless @results > 0;
 
  # we'll assume the 1st hit is what we want...
  if ($usefirst) {
        $searchTerm = $results[0]->{id};
        $show_name = $results[0]->{title};
        print "Using first search result: $show_name\n";
  }
  if (@results > 0 && !$usefirst) {
    my $choice = &displayMenu(@results);
    if ($choice =~ /^[Nn]$/) {
      $searchTerm = &getSearchTerm;
    }  elsif ($choice =~ /^[Ss]$/) {
       return;
    } else {
      $searchTerm = $results[$choice]->{id};
      $show_name = $results[$choice]->{title};
    }
  }
}
 
# we got a single result here (possibly by searching on id)
 
undef $xml;
 
if ($movie) { 
	$xml = imdbToTmpl($imdb);
	$cover = $imdb->cover();
} elsif ($tvshow) {
        $show = getEpiInfo($searchTerm,$episode,$season);
        $cover = getBanner($searchTerm,$season);
        $xml = showToTmpl($show);
}
 
my ($outfile,$path,$suffix) = fileparse($file,qr/\.[^.]*/);
$outfile .= ".xml";
 
# write output file
&writeXMLFile($path.$outfile, $xml)
    || exit 1;
 
 
$content = get($cover);
my ($coverfile,$coverfilepath,$coverfilesuffix) = fileparse($file,qr/\.[^.]*/);
$coverfile .= ".jpg" ;
 
if (open(OUT, ">$coverfilepath/$coverfile")) {
        print OUT $content ;
        close(OUT) ;
}
 
 my $fsize=stat("$coverfilepath/$coverfile")->size;
 if (($fsize < $filesize) && ($altimg) && ($movie)) {
	# use movieposterdb for movies since IMDB is so low res
        my $search = "http://www.movieposterdb.com/browse/search?type=movies&query=$searchTerm";
	my $content = get($search);
	my $id = $searchTerm;
	$id =~ s/^0//g;
	if ($content =~ m{img\ssrc="(http://www.movieposterdb.com/posters/[\/\S+\_]+/0?$id/\S+$id\_\S+.jpg)" }) {
		my $match = $1;
                $match =~ s/$id\/s_/$id\/l_/g;  
                $match =~ s/$id\/m_/$id\/l_/g;  
		if ($content = get($match)) {
                	print "Got new image via MoviePosterDB: $match\n";
			if (open(OUT, ">$coverfilepath/$coverfile")) {
        			print OUT $content ;
        			close(OUT) ;
			}
		}
	}
 }
}
exit 0;
 
sub writeXMLFile {
  my $outfile = shift;
  my $xml = shift;
 
  if (open(OUT, ">$outfile")) {
    binmode(OUT, ":utf8");
    print "Creating $outfile\n";
    print OUT $xml;
    close(OUT);
  } else {
    print "Failed to create $outfile\n";
    return undef;
  }
  return 1;
}
 
sub getSearchTerm {
  print "Enter new search term: ";
 
  my $term = <STDIN>;
  chomp($term);
  return $term;
}
 
sub displayMenu {
  my @results = @_;
  # present options
  my $i = $#results;
  my $maxpad = length($i);
  my $pad;
 
 if ($movie) {
  foreach my $result (reverse @{ $imdb->matched }) {
    my $length = length("$i");
    $pad = $length >= $maxpad ? 0 : $maxpad - $length;
    print ' ' x $pad;
    print "$i. ".$result->{title}."\n";
    $i--;
  }
 } elsif ($tvshow) {
  foreach my $result (reverse @results) {
    my $length = length("$i");
    $pad = $length >= $maxpad ? 0 : $maxpad - $length;
    print ' ' x $pad;
    print "$i. ".$result->{title}."\n";
    $i--;
  }
 }
 
  $pad = $maxpad - 1;
  print ' ' x $pad;
  print "N. enter a new search term\n";
  print ' ' x $pad;
  print "S. Skip this title\n";
 
  print "Got $#results results; use? [0]: ";
  my $choice = <STDIN>;
  chomp($choice);
 
  $choice = 0 if ($choice =~ m/^\s*$/);
 
  return $choice;
}
 
sub showToTmpl {
  my $show = shift;
  my $t;
  my $tmpl = HTML::Template->new(
	scalarref => $template_ref,
	die_on_bad_params => 0,
  );
  $t = $episode . ". " . $show->{'Name'};
  $tmpl->param(TITLE => $t);
  $tmpl->param(SUMMARY => $show->{'Name'});
  $tmpl->param(DESCRIPTION=> $show->{'Overview'});
  my $date = ParseDate($show->{'FirstAired'});
  $date = UnixDate("$date","%d %B %Y");
  $tmpl->param(DATE => "$date");
  $tmpl->param(EPISODE => $episode);
  $tmpl->param(SEASON => $season);
  $tmpl->param(NAME => $show_name);
  #$tmpl->param(ARTIST=> $show_name);
  $tmpl->param(EPISODENUM=> $episode);
  return $tmpl->output;
}
 
sub imdbToTmpl {
  my $film = shift;
  my $tmpl = HTML::Template->new(
        scalarref => $template_ref,
        die_on_bad_params => 0,
    );
 
  $tmpl->param(TITLE => $film->title);
  $tmpl->param(SUMMARY => $film->plot);
 
  if ($use_duration) {
    my $duration = $film->duration;
    if ($duration =~ /(\d+)/) {
		$duration = $1;
    } else {
		$duration = 0;
    }
    $tmpl->param(DURATION => $duration * 60);
  } else {
    $tmpl->param(DURATION => 0);
  }
 
  my $cert = $film->certifications;
  for my $country (keys %$cert) {
	if ($country =~ /US/) {
		$tmpl->param(CERTIFICATION => $cert->{$country});
	}
  }
 
  my $rating = $film->rating;
  $tmpl->param(RATING => $rating);
 
  # find earliest release date
  my $dates;
  if (defined($film->release_dates)) {
    foreach my $day (@{ $film->release_dates}) {
      if(my $date = ParseDate($day->{date})) {
        #$date = UnixDate("$date","%s");
        $dates->{$date} = $day->{country};
      }
    }
  }
  foreach my $utc (sort keys %$dates) {
    my $date = ParseDate($utc);
    $date = UnixDate("$date","%d %B %Y");
    $tmpl->param(DATE => "$date ($dates->{$utc})");
    last;
  }
 
  # genres
  my @genres = ();
  my $first = 1;
  foreach my $genre (@{ $film->genres }) {
    my %genre_row;
    my $open = $first ? '<genre primary="true">' : '<genre>';
    $first = 0;
    $genre_row{NAME} = "$open$genre</genre>";
    push(@genres, \%genre_row);
  }
  $tmpl->param(GENRE => \@genres);
 
  # cast
  my @cast = ();
  foreach my $castmember (@{ $film->cast }) {
    my %cast_row;
    $cast_row{NAME} = $castmember->{name};
    push(@cast, \%cast_row);
  }
  @cast = @cast[0..4];
  $tmpl->param(CAST => \@cast);
 
  # producers
  my @directors = ();
  foreach my $director (@{ $film->directors }) {
    my %director_row;
    $director_row{NAME} = $director->{name};
    push(@directors, \%director_row);
  }
  @directors = @directors[0..1] if $#directors > 0;
  $tmpl->param(DIRECTORS => \@directors);
 
  return $tmpl->output;
}
 
sub guessTitleFromFilename {
  my $file = shift;
  $season = "";
  $episode = "";
  my $guess = fileparse($file);
  if ($usedir && $movie) {
	$guess = $dir;
	print "Using Dir $dir\n";
  }
  if ($tvshow) {
  	$guess =~ s/\(.*\)//g; # remove anything in ()s
	$guess =~ /(.*?)[\.\s\-]+[Ss]?(\d{1,2})[Ee]?(\d{1,3})/;
	$guess = $1; 
	$season = $2;
	$episode = $3;
	my $season_tmp = $season . $episode;
	if (length($season_tmp) < 4) { # when episodes are #'erd like 101
		$season_tmp =~ /(\d)(\d{1,3})/;
		$season = $1;
		$episode = $2;
	}
	$guess =~ s/\./ /g;  # some shows have .'s instead of spaces
	$episode =~ s/^0//;
	$season =~ s/^0//;
  	print "Searching TheTVDB for: $guess (Season $season, Episode $episode)\n";
  } elsif ($movie) {
  	$guess =~ s/\..{1,3}\.?.{0,3}$//;   # strip off extension or sabnzbd 
		             		    # duplicate file/dir extension (.#)
  	$guess =~ s/\(.*\)//g; # remove anything in ()s
  	$guess =~ s/\[.*\]//g; # remove anything in []s
  	$guess =~ s/[\.|\'|\"|\,]//g;  # remove .,"'
  	$guess =~ tr/A-Z/a-z/; # eh
  	$guess =~ s/_/ /g;
  	$guess =~ s/-\d+$//;
  	print "Searching IMDB for: $guess\n";
 }
 
  return $guess;
}
 
sub getEpiID {
        # returns episodeID for a given series, season, episode.  
        my ($s,$e,$se) = @_;
        my $episode_url = "http://www.thetvdb.com/interfaces/GetEpisodes.php?seriesid=$s&episode=$e&season=$se";
        my $content = get ($episode_url);
        my $xs = XML::TreePP->new();
        my $ref = $xs->parse($content);
 
        foreach my $key (@{$ref->{Items}->{Item}}) {
                if ($key->{'id'}) {
                        return $key->{'id'};
                }
        }
}
 
sub doSeriesSearch {
        # returns an array of hashes with search results (name & ID)
        my $term = shift;
        my $series_url = "http://www.thetvdb.com/api/GetSeries.php?seriesname=$term";
        my $content = get ($series_url);
        my $xs = XML::TreePP->new();
        my $ref = $xs->parse($content);
        my @array;
        my $count = 0;
 
        # more than 1 result, we get an array, otherwise, we get a hash
        if (ref($ref->{Data}->{Series}) eq 'ARRAY') {
                foreach my $key (@{$ref->{Data}->{Series}}) {
                        $array[$count]->{id}=$key->{'seriesid'};
                        $array[$count]->{title}=$key->{'SeriesName'};
                        $count++;
                }
        } else {
                $array[$count]->{id}=$ref->{Data}->{Series}->{'seriesid'};
                $array[$count]->{title}=$ref->{Data}->{Series}->{'SeriesName'};
        }
        return @array;
}
 
sub getEpiInfo {
        # returns hash with episode info.
        my ($s,$e,$se) = @_;
        my $episode_url = "http://www.thetvdb.com/api/$tvdb_key/series/$s/default/$se/$e/en.xml";
        #my $episode_url = "http://www.thetvdb.com/interfaces/GetEpisodes.php?seriesid=$s&episode=$e&season=$se";       
        print "TheTVDB: $episode_url\n";
        my $content = get ($episode_url);
        my $xs = XML::TreePP->new();
        my $ref = $xs->parse($content);
        my %info;
 
        # going to assume we always get 1 result back, otherwise we kill a kitten
	#print Dumper $ref;
        if ($ref->{Data}->{Episode}->{'id'}) {
		if ($ref->{Data}->{Episode}->{'FirstAired'} !~ /^HASH/) {
                	$info{'FirstAired'} = $ref->{Data}->{Episode}->{'FirstAired'};
		} else {
                	$info{'FirstAired'} = "now";
		}
		if ($ref->{Data}->{Episode}->{'Overview'} !~ /^HASH/) {
			$info{'Overview'} = $ref->{Data}->{Episode}->{'Overview'};
		} else {
			$info{'Overview'} = "Unknown";
		}
		if ($ref->{Data}->{Episode}->{'EpisodeName'} !~ /^HASH/) {
			$info{'Name'} = $ref->{Data}->{Episode}->{'EpisodeName'};
		} else {
			$info{'Name'} = "Unknown";
		}
        }
        return \%info;
}
 
sub getBanner {
        # returns the url of a season-specific series image (if possible), otherwise, the most 
        # recent series image.  
        my ($seriesid,$season) = @_;
        my $banner_url = "http://www.thetvdb.com/api/$tvdb_key/series/$seriesid/banners.xml";
        my $banner_loc = "http://www.thetvdb.com/banners";
        my $content = get ($banner_url);
        my $xs = XML::TreePP->new();
        my $ref = $xs->parse($content);
        my $season_tmp;
        my %valid_banners = ();
 
        foreach my $key (@{$ref->{Banners}->{Banner}}) {
                if ($key->{'BannerType'} eq "season" && 
			 $key->{'BannerType2'} ne "seasonwide") {
                        $season_tmp = $key->{'Season'};
                        if ($season_tmp =~ /\d+/) {
                                if ($season eq $season_tmp) {
                                        # we found a season-specific image
                                        return "$banner_loc/$key->{'BannerPath'}";
                                } else {
				#	print "Type: " . $key->{'BannerType'} .  " Season: " . $key->{'Season'} .  " URL: $banner_loc/" . $key->{'BannerPath'} . "\n";
                                        $valid_banners{$season_tmp}="$banner_loc/$key->{'BannerPath'}"
                                }
                        }
                }
        }
 
        foreach my $b (reverse sort keys %valid_banners) {
                # otherwise, we return the most recent season-specific image
		#print " B: $b : selecing $valid_banners{$b}\n";
                return "$valid_banners{$b}";
        }
}
 
sub usage {
  my $name = fileparse($0);
  print "usage: $name [-movie|-tvshow] [-altimg] [-duration] [-overwrite] [-usedir] [-usefirst] path-to-search-for-files\n";
  print "    -altimg       try to get a larger image via google images\n\n"; 
  print "    -duration     normally leaves the duration string as 0 so that ATVFiles\n";
  print "                  will compute this itself.  using this will grab duration \n";
  print "                  from IMDB (only applies to movies)\n\n";
  print "    -overwrite    will overwrite existing .xml files (you will have the\n"; 
  print "                  option to skip files you don't want to overwrite).\n\n";
  print "    -usedir       will use the parent directory name instead of the filename\n";
  print "                  to classify the file (only applies to movies).\n\n";
  print "    -usefirst     will use the first match (#0) returned from the search\n";
  print "                  and assume that we have a successful match.  Useful\n";
  print "                  for unattended running.  Your names must be accurate\n";
  print "                  for this option to be useful.\n\n";
  print "  -tvshow|-movie  lookup either via TheTVDB or IMDB (-movie is default if\n";
  print "                  nothing is specified).";
  print "\n";
  print "\n";
  exit 1;
}
1;