QuestionQuestion

Imagine that you have a table available to you as a Spark dataframe. Each entry is a phone number. Your task is to return the columns where all the values in these columns are correctly formatted. The correct formats are: (xxx)-xxx-xxxx, xxx-xxx-xxxx, xxx-xxxxxxx and xxxxxxxxxx and nothing else. You can assume that your task is to validate the format and not the content (that means 000-000-0000 is a valid entry even though it's not a real phone number).
The purpose of this challenge is to demonstrate your coding skills and understanding of Spark. Please focus on clean code and passing the tests. You only need to test with a table that only has a small number of columns and rows. Add more rows and columns if you want and name your columns however you'd like. But please don't spend too much time creating test data. A table with < 10 columns and < 10 rows should be more than sufficient.
Free hint: Efficiency is key. For example, calling a Spark Action on each column is not the most efficient solution!

Solution PreviewSolution Preview

These solutions may offer step-by-step problem-solving explanations or good writing examples that include modern styles of formatting and construction of bibliographies out of text citations and references. Students may use these solutions for personal skill-building and practice. Unethical use is strictly forbidden.

package com.bosch.test


import org.apache.spark.sql.DataFrame
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.functions._

object FilterCol {
implicit def filter(df: DataFrame) = new filterImplicit(df)

def main(args: Array[String]): Unit = {
    val spark = org.apache.spark.sql.SparkSession.builder
      .master("local")
      .appName("ColumnFilter")
      .getOrCreate;
    var df = spark.read.option("header", "true").csv("DataFrame.csv" )
    df = df.filterColumns()
    print(df.show())
}
}

class filterImplicit(df: DataFrame) extends Serializable {

def filterColumns(): DataFrame...

By purchasing this solution you'll be able to access the following files:
Solution.png, Solution.scala and Solution.docx.

50% discount

Hours
Minutes
Seconds
$18.00 $9.00
for this solution

PayPal, G Pay, ApplePay, Amazon Pay, and all major credit cards accepted.

Find A Tutor

View available Computer Science - Other Tutors

Get College Homework Help.

Are you sure you don't want to upload any files?

Fast tutor response requires as much info as possible.

Decision:
Upload a file
Continue without uploading

SUBMIT YOUR HOMEWORK
We couldn't find that subject.
Please select the best match from the list below.

We'll send you an email right away. If it's not in your inbox, check your spam folder.

  • 1
  • 2
  • 3
Live Chats