The purpose of this assignment is to get you to use recursion, lists, and stacks in a useful way.
Background: HTML
HTML, the HyperText Markup Language, is used to describe the content of webpages. This is done by annotating the content with tags. HTML has two types of tags: Element tags and Void. HTML elements consist of a start tag, some content, and an end tag. For example,
<b> This is bold content </b>
illustrates a bold element, which is how content can be bolded in HTML. Observe that both the start and the end tag of an element have the same name, except that the end tag starts with a /.
Elements can be nested. An HTML document comprises many nested elements, such as the example below:
<!DOCTYPE html>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<p>Example of bold is here: <b> This is bold content </b></p>
There are also void or empty tags, which are singleton tags that do not bracket content and hence do not need to be closed. For example, \hr" in Figure 1 is a void tag. Special void tags, whose names begin with a !, are used to include comments and metadata in the HTML. Tags beginning with ! can be ignored.
If you are not familiar with the structure of HTML, your rst task is to go over a basic HTML tutorial, such as the one available here: default.asp
Problem: Analyzing the Structure of HTML
The nesting of tags creates a natural structure in an HTML document, which is useful to visualize. For example, the structure of the previous example can be visualized as an outline: In this example, the \html" element contains one element: \body". The \body" element
html 1
body 4
h1 0
p 0
p 1
b 0
contains four elements: \h1", \p", \hr", and \p". All the elements but the last one contain no elements, and the last element, \p" contains a \b" element. Observe that the deeper an element is nested, the further it is indented. Furthermore, every element is followed by an integer, indicating how many elements and void tags it contains.
Your task is to create a program that generates an outline representation of an HTML document based on the requirements below.
Write a program called that reads in an HTML le and outputs the corresponding outline representation. Your HTMLSummarizer class must implement the provided Tester interface and also contain the main() method where your program starts running. This is because your program will be tested via this interface. The interface contains a single method:
public ArrayList<String> compute( Scanner input );
This method must perform the required computation.
The method takes a Scanner object, which contains a valid HTML le. You may assume that every start tag is properly matched with an end tag, and that there are no errors in the input.
Hint: Use the provided HTMLScanner object to easily parse the input by using its two
methods: hasNextTag() and nextTag().
The HTML le will contain 0 or more tags.
The depth of a void tag or an element is the number of elements that surround it. For example, the depth of element \b" in Figure 1 is 3.
An element or void tag is a child of another element, if it is contained in that element and is of depth one greater than the element. For example, in Figure 1, \b" is a child of \p", but is not a child of \body". Whereas, \p", \hr", \p", and \h1" are all children of \body".
All tags whose names begins with a ! are to be ignored. For example, tags of the form <!DOCTYPE> and <!-- ...> do not appear in an outline representation of the HTML le.
In an outline representation the elements or void tags are ordered according to their order in the HTML le, and the depth of the element or void tag is represented by being indented a corresponding number of spaces.
The method compute( Scanner input ) should return an ArrayList of Strings denoting the outline representation of the HTML input. Each element or void tag should be a separate String. The order of the elements and void tags should be the same as in the HTML input and should be indented the same number of spaces as their depth. Lastly, each element in the outline is followed by a single space and an integer, denoting the number of children the element has.

Solution PreviewSolution Preview

This material may consist of step-by-step explanations on how to solve a problem or examples of proper writing, including the use of citations, references, bibliographies, and formatting. This material is made available for the sole purpose of studying and learning - misuse is strictly forbidden.

import java.util.*;
import java.util.regex.*;

public class HTMLSummarizer implements Tester {

// Method to compute tag sequence from HTML
public ArrayList<String> compute(Scanner input) {
    Pattern p = Pattern.compile("<(.+?)>");
    String line = input.nextLine();
    // Stack to store tags
    Stack<String> tags = new Stack<String>();
    // List to store result
    ArrayList<String> result = new ArrayList<String>();
    while (true) {
      try {
       line = input.nextLine();
      } catch (NoSuchElementException e) {
      if (line.contains("<!"))
      Matcher m = p.matcher(line);
      // Find tags on line
      while(m.find()) {
       String tag =;
       String padding = "";
       for (int i = 0; i < tags.size(); i++) {
          padding += " ";
       if (tag.charAt(0) != '/') {
          if (tag.equals("br") || tag.equals("hr")) {
for this solution

or FREE if you
register a new account!

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Java Programming Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Upload a file
Continue without uploading

We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats