Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam Questions and Answers

Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update 2 days ago
Total Questions : 180

Databricks Certified Associate Developer for Apache Spark 3.0 Exam is stable now with all latest exam questions are added 2 days ago. Incorporating Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 practice exam questions into your study plan is more than just a preparation strategy.

By familiarizing yourself with the Databricks Certified Associate Developer for Apache Spark 3.0 Exam exam format, identifying knowledge gaps, applying theoretical knowledge in Databricks practical scenarios, you are setting yourself up for success. Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam dumps provide a realistic preview, helping you to adapt your preparation strategy accordingly.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 exam questions often include scenarios and problem-solving exercises that mirror real-world challenges. Working through Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 dumps allows you to practice pacing yourself, ensuring that you can complete all Databricks Certified Associate Developer for Apache Spark 3.0 Exam exam questions within the allotted time frame without sacrificing accuracy.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

$48
~~$119.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

$56
~~$139.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

$70.8
~~$176.99~~

Add to Cart

Question # 1

Which of the following code blocks returns about 150 randomly selected rows from the 1000-row DataFrame transactionsDf, assuming that any row can appear more than once in the returned

DataFrame?

Options:

transactionsDf.resample(0.15, False, 3142)

transactionsDf.sample(0.15, False, 3142)

transactionsDf.sample(0.15)

transactionsDf.sample(0.85, 8429)

transactionsDf.sample(True, 0.15, 8261)

Discussion 0

Question # 2

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1.root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1.schema = StructType([

2. StructType("itemId", IntegerType(), True),

3. StructType("attributes", ArrayType(StringType(), True), True),

4. StructType("supplier", StringType(), True)

5.])

7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

Options:

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Discussion 0

Question # 3

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

spark.read.json(filePath)

spark.read.path(filePath, source="json")

spark.read().path(filePath)

spark.read().json(filePath)

spark.read.path(filePath)

Discussion 0

Question # 4

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

Options:

transactionsDf.drop(["predError", "value"])

transactionsDf.drop("predError", "value")

transactionsDf.drop(col("predError"), col("value"))

transactionsDf.drop(predError, value)

transactionsDf.drop("predError & value")

Discussion 0

Question # 5

Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?

Options:

transactionsDf.sort("storeId", asc("productId"))

transactionsDf.sort(col(storeId)).desc(col(productId))

transactionsDf.order_by(col(storeId), desc(col(productId)))

transactionsDf.sort("storeId", desc("productId"))

transactionsDf.sort("storeId").sort(desc("productId"))

Discussion 0

Question # 6

The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in

the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__(__4__)

Options:

1. filter

2. "storeId"==25

3. collect

4. 5

1. filter

2. col("storeId")==25

3. toLocalIterator

4. 5

1. select

2. storeId==25

3. head

4. 5

1. filter

2. col("storeId")==25

3. take

4. 5

1. filter

2. col("storeId")==25

3. collect

4. 5

Discussion 0

Question # 7

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header

and casting the columns in the most appropriate type. Find the error.

First 3 rows of transactions.csv:

1.transactionId;storeId;productId;name

2.1;23;12;green grass

3.2;35;31;yellow sun

4.3;23;12;green grass

Code block:

transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

Options:

The DataFrameReader is not accessed correctly.

The transaction is evaluated lazily, so no file will be read.

Spark is unable to understand the file type.

The code block is unable to capture all columns.

The resulting DataFrame will not have the appropriate schema.

Discussion 0

Question # 8

The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf. The column should contain arrays of maximum 4 strings. The arrays should be composed of

the values in column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Sample of DataFrame itemsDf:

1.+------+----------------------------------+-------------------+

2.|itemId|itemName |supplier |

3.+------+----------------------------------+-------------------+

4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|

5.|2 |Elegant Outdoors Summer Dress |YetiX |

6.|3 |Outdoors Backpack |Sports Company Inc.|

7.+------+----------------------------------+-------------------+

Code block:

itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))

Options:

1. withColumn

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 4

(Correct)

1. withColumnRenamed

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 4

1. withColumnRenamed

2. "itemName"

3. split

4. "itemNameBetweenSeparators"

5. 4

1. withColumn

2. "itemNameBetweenSeparators"

3. split

4. "itemName"

5. 5

1. withColumn

2. itemNameBetweenSeparators

3. str_split

4. "itemName"

5. 5

Discussion 0

Question # 9

Which of the following describes Spark's way of managing memory?

Options:

Spark uses a subset of the reserved system memory.

Storage memory is used for caching partitions derived from DataFrames.

As a general rule for garbage collection, Spark performs better on many small objects than few big objects.

Disabling serialization potentially greatly reduces the memory footprint of a Spark application.

Spark's memory usage can be divided into three categories: Execution, transaction, and storage.

Discussion 0

Question # 10

Which of the following statements about Spark's configuration properties is incorrect?

Options:

The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

The default value for spark.sql.autoBroadcastJoinThreshold is 10M

The default number of partitions to use when shuffling data for joins or aggregations is 300.

The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

Discussion 0

Question # 11

In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from

most to least often?

Sample of DataFrame articlesDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

1. articlesDf = articlesDf.groupby("col")

2. articlesDf = articlesDf.select(explode(col("attributes")))

3. articlesDf = articlesDf.orderBy("count").select("col")

4. articlesDf = articlesDf.sort("count",ascending=False).select("col")

5. articlesDf = articlesDf.groupby("col").count()

4, 5

2, 5, 3

5, 2

2, 3, 4

2, 5, 4

Discussion 0

Question # 12

Which of the following is the deepest level in Spark's execution hierarchy?

Options:

Job

Task

Executor

Slot

Stage

Discussion 0

Question # 13

Which of the following code blocks creates a new DataFrame with 3 columns, productId, highest, and lowest, that shows the biggest and smallest values of column value per value in column

productId from DataFrame transactionsDf?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.max('value').min('value')

transactionsDf.agg(max('value').alias('highest'), min('value').alias('lowest'))

transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias("lowest"))

transactionsDf.groupby('productId').agg(max('value').alias('highest'), min('value').alias('lowest'))

transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})

Discussion 0

Question # 14

The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching

value in column itemId with a value in column transactionsId of DataFrame transactionsDf. Find the error.

Code block:

itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)

Options:

The join statement is incomplete.

The union method should be used instead of join.

The join method is inappropriate.

The merge method should be used instead of join.

The join expression is malformed.

Discussion 0

Question # 15

Which of the following code blocks returns a copy of DataFrame transactionsDf that only includes columns transactionId, storeId, productId and f?

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.drop(col("value"), col("predError"))

transactionsDf.drop("predError", "value")

transactionsDf.drop(value, predError)

transactionsDf.drop(["predError", "value"])

transactionsDf.drop([col("predError"), col("value")])

Discussion 0

Question # 16

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+----------------+

3.+-------------+---------+-----+-------+---------+----+----------------+

4.| 1| 3| 4| 25| 1|null|2020-04-26 15:35|

5.| 2| 6| 7| 2| 2|null|2020-04-13 22:01|

6.| 3| 3| null| 25| 3|null|2020-04-02 10:53|

7.+-------------+---------+-----+-------+---------+----+----------------+

Code block:

1.transactionsDf = transactionsDf.drop("transactionDate")

2.transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM-dd")

Options:

Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used

instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column

transactionDate should be wrapped in a col() operator.

Column transactionDate should be wrapped in a col() operator.

The string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column

transactionDate with the new column transactionTimestamp.

Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used

instead of the existing column assignment.

Discussion 0

Question # 17

Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?

Schema of first partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

Schema of second partition:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- rollId: integer (nullable = true)

7. |-- f: integer (nullable = true)

8. |-- tax_id: integer (nullable = false)

Options:

spark.read.parquet(filePath, mergeSchema='y')

spark.read.option("mergeSchema", "true").parquet(filePath)

spark.read.parquet(filePath)

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.union(df_temp)

10. nx = nx+1

11.df

1.nx = 0

2.for file in dbutils.fs.ls(filePath):

3. if not file.name.endswith(".parquet"):

4. continue

5. df_temp = spark.read.parquet(file.path)

6. if nx == 0:

7. df = df_temp

8. else:

9. df = df.join(df_temp, how="outer")

10. nx = nx+1

11.df

Discussion 0

Question # 18

The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.

Code block:

transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

Options:

The "outer" argument should be eliminated, since "outer" is the default join type.

The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Discussion 0

Question # 19

The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

Options:

transactionsDf.remove(transactionsDf.storeId==25)

transactionsDf.where(transactionsDf.storeId!=25)

transactionsDf.filter(transactionsDf.storeId==25)

transactionsDf.drop(transactionsDf.storeId==25)

transactionsDf.select(transactionsDf.storeId!=25)

Discussion 0

Question # 20

The code block shown below should add column transactionDateForm to DataFrame transactionsDf. The column should express the unix-format timestamps in column transactionDate as string

type like Apr 26 (Sunday). Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, from_unixtime(__3__, __4__))

Options:

1. withColumn

2. "transactionDateForm"

3. "MMM d (EEEE)"

4. "transactionDate"

1. select

2. "transactionDate"

3. "transactionDateForm"

4. "MMM d (EEEE)"

1. withColumn

2. "transactionDateForm"

3. "transactionDate"

4. "MMM d (EEEE)"

1. withColumn

2. "transactionDateForm"

3. "transactionDate"

4. "MM d (EEE)"

1. withColumnRenamed

2. "transactionDate"

3. "transactionDateForm"

4. "MM d (EEE)"

Discussion 0

Question # 21

The code block shown below should return a single-column DataFrame with a column named consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame

itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

DataFrame itemsDf:

1.+------+----------------------------------+-----------------------------+-------------------+

3.+------+----------------------------------+-----------------------------+-------------------+

7.+------+----------------------------------+-----------------------------+-------------------+

Code block:

itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))

Options:

1. length

2. regexp_extract

3. upper

4. col("itemName")

5. as

1. size

2. regexp_replace

3. lower

4. "itemName"

5. alias

1. lower

2. regexp_replace

3. length

4. "itemName"

5. alias

1. length

2. regexp_replace

3. lower

4. col("itemName")

5. alias

1. size

2. regexp_extract

3. lower

4. col("itemName")

5. alias

Discussion 0

Question # 22

In which order should the code blocks shown below be run in order to create a DataFrame that shows the mean of column predError of DataFrame transactionsDf per column storeId and productId,

where productId should be either 2 or 3 and the returned DataFrame should be sorted in ascending order by column storeId, leaving out any nulls in that column?

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.| 4| null| null| 3| 2|null|

8.| 5| null| null| null| 2|null|

9.| 6| 3| 2| 25| 2|null|

10.+-------------+---------+-----+-------+---------+----+

1. .mean("predError")

2. .groupBy("storeId")

3. .orderBy("storeId")

4. transactionsDf.filter(transactionsDf.storeId.isNotNull())

5. .pivot("productId", [2, 3])

Options:

4, 5, 2, 3, 1

4, 2, 1

4, 1, 5, 2, 3

4, 2, 5, 1, 3

4, 3, 2, 5, 1

Discussion 0

Question # 23

Which of the following code blocks reorders the values inside the arrays in column attributes of DataFrame itemsDf from last to first one in the alphabet?

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

itemsDf.withColumn('attributes', sort_array(col('attributes').desc()))

itemsDf.withColumn('attributes', sort_array(desc('attributes')))

itemsDf.withColumn('attributes', sort(col('attributes'), asc=False))

itemsDf.withColumn("attributes", sort_array("attributes", asc=False))

itemsDf.select(sort_array("attributes"))

Discussion 0

Question # 24

Which of the following statements about broadcast variables is correct?

Options:

Broadcast variables are serialized with every single task.

Broadcast variables are commonly used for tables that do not fit into memory.

Broadcast variables are immutable.

Broadcast variables are occasionally dynamically updated on a per-task basis.

Broadcast variables are local to the worker node and not shared across the cluster.

Discussion 0

Question # 25

Which of the following options describes the responsibility of the executors in Spark?

Options:

The executors accept jobs from the driver, analyze those jobs, and return results to the driver.

The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.

The executors accept tasks from the driver, execute those tasks, and return results to the driver.

The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.

The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

Discussion 0

Question # 26

Which of the following code blocks displays various aggregated statistics of all columns in DataFrame transactionsDf, including the standard deviation and minimum of values in each column?

Options:

transactionsDf.summary()

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min")

transactionsDf.summary("count", "mean", "stddev", "25%", "50%", "75%", "max").show()

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min").show()

transactionsDf.summary().show()

Discussion 0

Question # 27

In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame

transactionsDf and itemsDf on columns productId and itemId, respectively?

1. .filter(~isnull(col('value')))

2. .count()

3. transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))

4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')

5. .filter(col('value').isnotnull())

6. .sum(col('value'))

Options:

4, 1, 2

3, 1, 6

3, 1, 2

3, 5, 2

4, 6

Discussion 0

Get Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 dumps and pass your exam in 24 hours!

Weekend Sale Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 2493360325

Good News !!! Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Exam is now Stable and With Pass Result

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam Questions and Answers

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Testing Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 PDF + Testing Engine

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Options:

Free Exams Sample Questions

We Accept

Secure Site

Customer Review

Money Back Guarantee