QuestionQuestion

Goal
Build search for the top 1000 IMDB movies. Users should be able to search for different aspects of the movie (e.g. director name) and get back the set of movies related to it. For instance, in my implementation:

1. Searching for “spielberg” returned the following list: ["Schindler's List","Saving Private Ryan","Raiders of the Lost Ark","Indiana Jones and the Last Crusade","Jurassic Park","Catch Me If You Can","Jaws","E.T. the Extra-Terrestrial","Empire of the Sun","The Color Purple","Minority Report","Close Encounters of the Third Kind","Bridge of Spies","Indiana Jones and the Temple of Doom","Munich"]

2. Searching for "spielberg hanks" returns: ["Saving Private Ryan","Catch Me If You Can","Bridge of Spies"]. This is the list of all movies associated with both Spielberg and Hanks.

Components

1. Crawl: You will need to crawl the IMDB listing pages (there are multiple pages) and all the movies linked off of the listing pages.

2. Parse: You will need to parse the pages to extract the right information.

3. Search database: “Index” the information in some way. Given the scale of the problem is not large, you can use a simple in-memory data structure.

4. Expose a simple API that returns the movie names given a search term.

Non Goals
This is a very open-ended problem. The goal is not to build a comprehensive solution. Feel free to take decisions on the scope of what you want to accomplish. You can take as much or as little time on the exercise but my recommendation would be spending 3-5 hours at the most. Some decision you can make on the scope:
• Relevance and ranking are not in scope.
• Your data structure should be optimized for lookup time (take into account the number of lookups). But you don't have to worry about memory scale at this point.
• There is no UX needed. A simple API would do.

Feel free to make your own decisions on the scope.

Helpful tips
• Most languages provide libraries for extracting information from web pages using CSS selectors (e.g. BeautifulSoup for python)
• Chrome provides you the ability to look at the CSS selector for a given element on the page

Deliverables
• Code for this exercise along with instructions on how to run it.
• Thoughts on some simplifying assumptions you made and why.
• Some thoughts on If you had more time, how you would improve the solution from the perspective of software architecture, scale, performance or quality.

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

movies_result = list()

movie_count = 0

for i in range(1, 21):

    print("Processing page: {0} \n".format(i))
    url_get = requests.get(main_url.format(i))
    html_soup = BeautifulSoup(url_get.content, 'html5lib')

    movie_list = html_soup.select('.lister-item')

    for movie in movie_list:

       movie_count += 1
      
       div = movie.select('.col-title')[0]
       span = div.select('span')[0]
       a_tag = div.select('a')[0]

       link = a_tag.get('href')
       title = a_tag.text

       movie_dict = dict()
       movie_dict["title"] = title
       movie_dict["url"] = "http://www.imdb.com" + link
       movie_dict["rank"] = movie_count
      
       print("")
       print(u"Processing {0}: {1}".format(movie_count, movie_dict["title"]))
      
       url_get = requests.get(movie_dict["url"])...

By purchasing this solution you'll be able to access the following files:
Solution.zip.

50% discount

Hours
Minutes
Seconds
$30.00 $15.00
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Web Development (HTML, XML, PHP, JavaScript, Adobe, Flash, etc.) Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats