Your task is to implement a simplified general purpose aggregator. An aggregator is a program that systematically compiles a specific type of information from multiple online sources.
The aggregator will take a filename and a topic as command line arguments. The file will contain a possibly very long list of online sources (urls). The topic will be a string such as flu, football, etc...
The aggregator will issue an error message if the command line arguments provided are too few or too many.
Our aggregator will open and read the urls contained in the file, and it will report back on the subset of urls that contain a reference to the specified topic. The program will put the output in a text file. That output will contain both the urls and the text containing the reference.
The output file will be created in the working directory. Its name will consist of the topic followed by summary.txt. So when the topic is flu, the output filename will be flusummary.txt, when the topic is football, the output filename will be footballsummary.txt.
Since our program is reading html documents and we are interested in actual text (not in matches found inside html tags), the text containing the reference will be delimited by the innermost angle brackets: >text containing the topic to capture<. The angle brackets should of course not be included in the captured text. Since our aggregator will be reading urls on the open web, it may encounter errors: it must handle both URLError and DecodeError and generate the appropriate message for each.
You must be able to invoke your aggregator from the terminal by typing: python some_filename some_topic

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

Implement a simple general purpose aggregator

Usage: filename topic
filename: input file that contains a list of the online sources (urls).
topic: topic to be researched and reported on

import urllib.request
import urllib.error
import re
import sys

# Enter your function definitions here

Method to get URLs from filename

filename (string) - name of file to read URLs from

urls (list) - list of strings containing URLs in file
def getURLs(filename):
    urls = []
    f = open(filename)
    for line in f:
       line = line.strip()
    return urls

Method to strip HTML from text

text (string) - text with HTML code

result (string) - text with HTML tags removed
def stripHTML(text):
    p = re.compile(r'<.*?>')
    result = p.sub('\n', text)
    result = result.strip()
    return result...

By purchasing this solution you'll be able to access the following files:

for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Python Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats