CCA175 Practice Exam Questions and Answers

CCA Spark and Hadoop Developer Exam

Last Update 4 hours ago
Total Questions : 96

CCA Spark and Hadoop Developer Exam is stable now with all latest exam questions are added 4 hours ago. Incorporating CCA175 practice exam questions into your study plan is more than just a preparation strategy.

By familiarizing yourself with the CCA Spark and Hadoop Developer Exam exam format, identifying knowledge gaps, applying theoretical knowledge in Cloudera practical scenarios, you are setting yourself up for success. CCA175 exam dumps provide a realistic preview, helping you to adapt your preparation strategy accordingly.

CCA175 exam questions often include scenarios and problem-solving exercises that mirror real-world challenges. Working through CCA175 dumps allows you to practice pacing yourself, ensuring that you can complete all CCA Spark and Hadoop Developer Exam exam questions within the allotted time frame without sacrificing accuracy.

CCA175 PDF

$39.6
~~$99~~

Add to Cart

CCA175 Testing Engine

$51.6
~~$129~~

Add to Cart

CCA175 PDF + Testing Engine

$60
~~$149.99~~

Add to Cart

Question # 1

Problem Scenario 90 : You have been given below two files

course.txt

id,course

1,Hadoop

2,Spark

3,HBase

fee.txt

id,fee

2,3900

3,4200

4,2900

Accomplish the following activities.

1. Select all the courses and their fees , whether fee is listed or not.

2. Select all the available fees and respective course. If course does not exists still list the fee

3. Select all the courses and their fees , whether fee is listed or not. However, ignore records having fee as null.

Options:

Discussion 0

Question # 2

Problem Scenario 40 : You have been given sample data as below in a file called spark15/file1.txt

3070811,1963,1096,,"US","CA",,1,

3022811,1963,1096,,"US","CA",,1,56

3033811,1963,1096,,"US","CA",,1,23

Below is the code snippet to process this tile.

val field= sc.textFile("spark15/f ilel.txt")

val mapper = field.map(x=> A)

mapper.map(x => x.map(x=> {B})).collect

Please fill in A and B so it can generate below final output

Array(Array(3070811,1963,109G, 0, "US", "CA", 0,1, 0)

,Array(3022811,1963,1096, 0, "US", "CA", 0,1, 56)

,Array(3033811,1963,1096, 0, "US", "CA", 0,1, 23)

)

Options:

Discussion 0

Question # 3

Problem Scenario 31 : You have given following two files

1. Content.txt: Contain a huge text file containing space separated words.

2. Remove.txt: Ignore/filter all the words given in this file (Comma Separated).

Write a Spark program which reads the Content.txt file and load as an RDD, remove all the words from a broadcast variables (which is loaded as an RDD of words from Remove.txt). And count the occurrence of the each word and save it as a text file in HDFS.

Content.txt

Hello this is ABCTech.com

This is TechABY.com

Apache Spark Training

This is Spark Learning Session

Spark is faster than MapReduce

Remove.txt

Hello, is, this, the

Options:

Discussion 0

Question # 4

Problem Scenario 60 : You have been given below code snippet.

val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3}

val b = a.keyBy(_.length)

val c = sc.parallelize(List("dog","cat","gnu","salmon","rabbit","turkey","woif","bear","bee"), 3)

val d = c.keyBy(_.length)

operation1

Write a correct code snippet for operationl which will produce desired output, shown below.

Array[(lnt, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)), (6,(salmon,turkey)), (6,(salmon,salmon)), (6,(salmon,rabbit)),

(6,(salmon,turkey)), (3,(dog,dog)), (3,(dog,cat)), (3,(dog,gnu)), (3,(dog,bee)), (3,(rat,dog)), (3,(rat,cat)), (3,(rat,gnu)), (3,(rat,bee)))

Options:

Discussion 0

Question # 5

Problem Scenario 96 : Your spark application required extra Java options as below. -XX:+PrintGCDetails-XX:+PrintGCTimeStamps

Please replace the XXX values correctly

./bin/spark-submit --name "My app" --master local[4] --conf spark.eventLog.enabled=talse --conf XXX hadoopexam.jar

Options:

Discussion 0

Question # 6

Problem Scenario 46 : You have been given belwo list in scala (name,sex,cost) for each work done.

List( ("Deeapak" , "male", 4000), ("Deepak" , "male", 2000), ("Deepika" , "female", 2000),("Deepak" , "female", 2000), ("Deepak" , "male", 1000) , ("Neeta" , "female", 2000))

Now write a Spark program to load this list as an RDD and do the sum of cost for combination of name and sex (as key)

Options:

Discussion 0

Question # 7

Problem Scenario 80 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.products

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. Copy "retaildb.products" table to hdfs in a directory p93_products

2. Now sort the products data sorted by product price per category, use productcategoryid colunm to group by category

Options:

Discussion 0

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation:

Solution :

Step 1 : Import Single table .

sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba -password=cloudera -table=products --target-dir=p93

Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs

Step 2 : Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p93_products/part-m-00000

Step 3 : Load this directory as RDD using Spark and Python (Open pyspark terminal and do following}. productsRDD = sc.textFile(Mp93_products")

Step 4 : Filter empty prices, if exists

#filter out empty prices lines

Nonempty_lines = productsRD

filter(lambda x: len(x.split(",")[4]) > 0)

Step 5 : Create data set like (categroyld, (id,name,price)

mappedRDD = nonempty_lines.map(lambda line: (line.split(",")[1], (line.split(",")[0], line.split(",")[2], float(line.split(",")[4]))))

tor line in mappedRD

collect(): print(line)

Step 6 : Now groupBy the all records based on categoryld, which a key on mappedRDD it will produce output like (categoryld, iterable of all lines for a key/categoryld)

groupByCategroyld = mappedRD

groupByKey() for line in groupByCategroyld.collect(): print(line)

step 7 : Now sort the data in each category based on price in ascending order.

# sorted is a function to sort an iterable, we can also specify, what would be the Key on which we want to sort in this case we have price on which it needs to be sorted.

groupByCategroyld.map(lambda tuple: sorted(tuple[1], key=lambda tupleValue: tupleValue[2])).take(5)

Step 8 : Now sort the data in each category based on price in descending order.

# sorted is a function to sort an iterable, we can also specify, what would be the Key on which we want to sort in this case we have price which it needs to be sorted.

on groupByCategroyld.map(lambda tuple: sorted(tuple[1], key=lambda tupleValue: tupleValue[2] , reverse=True)).take(5)

Question # 8

Problem Scenario 37 : ABCTECH.com has done survey on their Exam Products feedback using a web based form. With the following free text field as input in web ui.

Name: String

Subscription Date: String

Rating : String

And servey data has been saved in a file called spark9/feedback.txt

Christopher|Jan 11, 2015|5

Kapil|11 Jan, 2015|5

Thomas|6/17/2014|5

John|22-08-2013|5

Mithun|2013|5

Jitendra||5

Write a spark program using regular expression which will filter all the valid dates and save in two separate file (good record and bad record)

Options:

Discussion 0

Question # 9

Problem Scenario 89 : You have been given below patient data in csv format,

patientID,name,dateOfBirth,lastVisitDate

1001,Ah Teck,1991-12-31,2012-01-20

1002,Kumar,2011-10-29,2012-09-20

1003,Ali,2011-01-30,2012-10-21

Accomplish following activities.

1. Find all the patients whose lastVisitDate between current time and '2012-09-15'

2. Find all the patients who born in 2011

3. Find all the patients age

4. List patients whose last visited more than 60 days ago

5. Select patients 18 years old or younger

Options:

Discussion 0

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation:

Solution :

Step 1:

hdfs dfs -mkdir sparksql3

hdfs dfs -put patients.csv sparksql3/

Step 2 : Now in spark shell

// SQLContext entry point for working with structured data

val sqlContext = neworg.apache.spark.sql.SQLContext(sc)

// this is used to implicitly convert an RDD to a DataFrame.

import sqlContext.impIicits._

// Import Spark SQL data types and Row.

import org.apache.spark.sql._

// load the data into a new RDD

val patients = sc.textFilef'sparksqIS/patients.csv")

// Return the first element in this RDD

patients.first()

//define the schema using a case class

case class Patient(patientid: Integer, name: String, dateOfBirth:String , lastVisitDate: String)

// create an RDD of Product objects

val patRDD = patients.map(_.split(M,M)).map(p => Patient(p(0).tolnt,p(1),p(2),p(3)))

patRD

first()

patRD

count(}

// change RDD of Product objects to a DataFrame val patDF = patRD

toDF()

// register the DataFrame as a temp table patD

registerTempTable("patients"}

// Select data from table

val results = sqlContext.sql(......SELECT* FROM patients '.....)

// display dataframe in a tabular format

results.show()

//Find all the patients whose lastVisitDate between current time and '2012-09-15'

val results = sqlContext.sql(......SELECT * FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(lastVisitDate, 'yyyy-MM-dd') AS TIMESTAMP)) BETWEEN '2012-09-15' AND current_timestamp() ORDER BY lastVisitDate......)

results.showQ

/.Find all the patients who born in 2011

val results = sqlContext.sql(......SELECT * FROM patients WHERE YEAR(TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP))) = 2011 ......)

results. show()

//Find all the patients age

val results = sqlContext.sql(......SELECT name, dateOfBirth, datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TlMESTAMP}}}/365 AS age

FROM patients

Mini >

results.show()

//List patients whose last visited more than 60 days ago

-- List patients whose last visited more than 60 days ago

val results = sqlContext.sql(......SELECT name, lastVisitDate FROM patients WHERE datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP[lastVisitDate, 'yyyy-MM-dd') AS T1MESTAMP))) > 60......);

results. showQ;

-- Select patients 18 years old or younger

SELECT' FROM patients WHERE TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP}) > DATE_SUB(current_date(),INTERVAL 18 YEAR);

val results = sqlContext.sql(......SELECT' FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM--dd') AS TIMESTAMP)) > DATE_SUB(current_date(), T8*365)......);

results. showQ;

val results = sqlContext.sql(......SELECT DATE_SUB(current_date(), 18*365) FROM patients......);

results.show();

Question # 10

Problem Scenario 76 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Columns of order table : (orderid , order_date , ordercustomerid, order_status}

.....

Please accomplish following activities.

1. Copy "retail_db.orders" table to hdfs in a directory p91_orders.

2. Once data is copied to hdfs, using pyspark calculate the number of order for each status.

3. Use all the following methods to calculate the number of order for each status. (You need to know all these functions and its behavior for real exam)

- countByKey()

-groupByKey()

- reduceByKey()

-aggregateByKey()

- combineByKey()

Options:

Discussion 0

Answer:

See the explanation for Step by Step Solution and configuration.

Explanation:

Solution :

Step 1 : Import Single table

sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail dba -password=cloudera -table=orders --target-dir=p91_orders

Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs

Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p91_orders/part-m-00000

Step 3: countByKey #Number of orders by status allOrders = sc.textFile("p91_orders")

#Generate key and value pairs (key is order status and vale as an empty string keyValue = aIIOrders.map(lambda line: (line.split(",")[3], ""))

#Using countByKey, aggregate data based on status as a key output=keyValue.countByKey()Jtems()

for line in output: print(line)

Step 4 : groupByKey

#Generate key and value pairs (key is order status and vale as an one

keyValue = allOrders.map(lambda line: (line.split)",")[3], 1))

#Using countByKey, aggregate data based on status as a key output= keyValue.groupByKey().map(lambda kv: (kv[0], sum(kv[1]}}}

tor line in output.collect(): print(line}

Step 5 : reduceByKey

#Generate key and value pairs (key is order status and vale as an one

keyValue = allOrders.map(lambda line: (line.split(","}[3], 1))

#Using countByKey, aggregate data based on status as a key output= keyValue.reduceByKey(lambda a, b: a + b)

tor line in output.collect(): print(line}

Step 6: aggregateByKey

#Generate key and value pairs (key is order status and vale as an one keyValue = allOrders.map(lambda line: (line.split(",")[3], line}}

output=keyValue.aggregateByKey(0, lambda a, b: a+1, lambda a, b: a+b}

for line in output.collect(): print(line}

Step 7 : combineByKey

#Generate key and value pairs (key is order status and vale as an one

keyValue = allOrders.map(lambda line: (line.split(",")[3], line))

output=keyValue.combineByKey(lambda value: 1, lambda ace, value: acc+1, lambda ace, value: acc+value)

tor line in output.collect(): print(line)

#Watch Spark Professional Training provided by www.ABCTECH.com to understand more on each above functions. (These are very important functions for real exam)

Question # 11

Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.

1. Select Maximum, minimum, average , Standard Deviation, and total quantity.

2. Select minimum and maximum price for each product code.

3. Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standard deviation will have maximum two decimal values.

4. Select all the product code and average price only where product count is more than or equal to 3.

5. Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.

Options:

Discussion 0

Question # 12

Problem Scenario 73 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}

{"first_name":"Amir", "last_name":"Khan"}

{"first_name":"Rajesh", "last_name":"Khanna"}

{"first_name":"Priynka", "last_name":"Chopra"}

{"first_name":"Kareena", "last_name":"Kapoor"}

{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json file locally.

2. Load this file on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

Options:

Discussion 0

Question # 13

Problem Scenario 38 : You have been given an RDD as below,

val rdd: RDD[Array[Byte]]

Now you have to save this RDD as a SequenceFile. And below is the code snippet.

import org.apache.hadoop.io.compress.GzipCodec

rdd.map(bytesArray => (

get(), new B(bytesArray))).saveAsSequenceFile('7output/path",classOt[GzipCodec])

What would be the correct replacement for A and B in above snippet.

Options:

Discussion 0

Question # 14

Problem Scenario 62 : You have been given below code snippet.

val a = sc.parallelize(List("dogM, "tiger", "lion", "cat", "panther", "eagle"), 2)

val b = a.map(x => (x.length, x))

operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))

Options:

Discussion 0

Get CCA175 dumps and pass your exam in 24 hours!

Labour Day Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 2493360325

Good News !!! CCA175 CCA Spark and Hadoop Developer Exam is now Stable and With Pass Result

CCA175 Practice Exam Questions and Answers

CCA175 PDF

CCA175 Testing Engine

CCA175 PDF + Testing Engine

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Free Exams Sample Questions

We Accept

Secure Site

Customer Review

Money Back Guarantee