The aggregator will take a filename and a topic as command line arguments. The file will contain a possibly very long list of online sources (urls). The topic will be a string such as flu, football, etc...
The aggregator will issue an error message if the command line arguments provided are too few or too many.
Our aggregator will open and read the urls contained in the file, and it will report back on the subset of urls that contain a reference to the specified topic. The program will put the output in a text file. That output will contain both the urls and the text containing the reference.
The output file will be created in the working directory. Its name will consist of the topic followed by summary.txt. So when the topic is flu, the output filename will be flusummary.txt, when the topic is football, the output filename will be footballsummary.txt.
Since our program is reading html documents and we are interested in actual text (not in matches found inside html tags), the text containing the reference will be delimited by the innermost angle brackets: >text containing the topic to capture<. The angle brackets should of course not be included in the captured text. Since our aggregator will be reading urls on the open web, it may encounter errors: it must handle both URLError and DecodeError and generate the appropriate message for each.
You must be able to invoke your aggregator from the terminal by typing: python aggregator.py some_filename some_topic
These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.Implement a simple general purpose aggregator
Usage: aggregator.py filename topic
filename: input file that contains a list of the online sources (urls).
topic: topic to be researched and reported on
# Enter your function definitions here
Method to get URLs from filename
filename (string) - name of file to read URLs from
urls (list) - list of strings containing URLs in file
urls = 
f = open(filename)
for line in f:
line = line.strip()
Method to strip HTML from text
text (string) - text with HTML code
result (string) - text with HTML tags removed
p = re.compile(r'<.*?>')
result = p.sub('\n', text)
result = result.strip()
By purchasing this solution you'll be able to access the following files: