pvptools/GetMovieIMDBIDs

Get IMDB IDs for your movies

The script get_movie_ids.php is intended to find the IMDB IDs for those of your movies which have them not yet associated, and update them into your database. The added plus of these IDs is found in multiple things:

  • when viewing an entry, a click on its title can bring you directly to the corresponding IMDB page (instead of issuing a search to find it)
  • it gives you the possibility to do the same for the crew members (as described on the page GetNameIMDBIDs)

Just to give a few examples.

How it works

The script issues a search for the movies title against the configured IMDB site. Then, the result set is evaluated for different criteria: Did the title match 100%? Or did one of the movies AKAs match? Is it the same year? The same director? Depending on your settings, the movies record then is updated when certain criteria match:

Setting Description Safe? Value in the shipped version
$write_yt100% match on title and yearYes, it is! TRUE
$write_aka_yt100% match on title and year for AKAYes, it is! TRUE
$write_yf100% match on year, take 1st entry, i.e. the first AKA listedNot really - causes many false positivesFALSE
$write_yd100% match on year and directorTo some degree - causes some false positivesFALSE

On a run against a database with 500 entries, the script automatically identified 270 movies with only 3 false positives (all of them caused by the $write_yd mode, while $write_yf was not used at all since it causes too many false positives), which is not that bad.

How to use the script

To put the hard issue first: It is not that easy as just to adjust the values and run the script once - you will need multiple executions for sure, with changed values. It is never a bad idea to keep track what is done, so better do all things in small steps. This can be done with the $skip_to and stop_at settings, picking a subset of all your entries. You can also exclude entries using the $skip_id array. The values assigned to these variables refer to the set of movies selected - so the same number may refer to a different movie on a subsequent run, if there have been some movies updated meanwhile.

A good process may look like this:

  1. Check the settings in the configuration file and make sure that the directory and the ID of the admin user are correct (the latter you find on the User Management page). The next two settings (update the IMDBID and the rating) you will probably have set to TRUE.
  2. Set all "writes" to FALSE, take the first group of movies i.e. by setting $skip_to=1 and $stop_at=100. Run the script from the command line, while (optionally) spooling it's output to a file, e.g. ./get_movie_ids.php demo | tee update.log - this way you can verify the output even if it's off of your terminal
  3. Watch the output for the word "MATCH" (yes, uppercase) - that's where the script found something. If there are no false positives, you can set the corresponding "writes" (see above table) to TRUE and run the script again. Otherwise, you may restrict your "frame" by playing with the $skip_to and $stop_at parameters, or simply exclude the number in the square brackets by adding it to the $skip_id array (comma separated values between the parenthesis)
  4. Move on to the next "frame" by increasing the $skip_to and $stop_at values

Conclusion

While this looks like a lot of work, it is still much faster than updating all database entries manually. And the lazy can just use the first two "write" variables, do a fast run, and have at least some of the movies updated without too much work ;)

Last modified by izzy, 02/16/09 12:42:22 (19 months ago)