최신 Databricks Certification Associate-Developer-Apache-Spark-3.5 무료샘플문제:
1. An engineer has two DataFrames: df1 (small) and df2 (large). A broadcast join is used:
python
CopyEdit
frompyspark.sql.functionsimportbroadcast
result = df2.join(broadcast(df1), on='id', how='inner')
What is the purpose of using broadcast() in this scenario?
Options:
A) It increases the partition size for df1 and df2.
B) It filters the id values before performing the join.
C) It ensures that the join happens only when the id values are identical.
D) It reduces the number of shuffle operations by replicating the smaller DataFrame to all nodes.
2. A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately.
Which two characteristics of Apache Spark's execution model explain this behavior?
Choose 2 answers:
A) The Spark engine requires manual intervention to start executing transformations.
B) Transformations are evaluated lazily.
C) The Spark engine optimizes the execution plan during the transformations, causing delays.
D) Transformations are executed immediately to build the lineage graph.
E) Only actions trigger the execution of the transformation pipeline.
3. A data analyst wants to add a column date derived from a timestamp column.
Options:
A) dates_df.withColumn("date", f.unix_timestamp("timestamp")).show()
B) dates_df.withColumn("date", f.date_format("timestamp", "yyyy-MM-dd")).show()
C) dates_df.withColumn("date", f.to_date("timestamp")).show()
D) dates_df.withColumn("date", f.from_unixtime("timestamp")).show()
4. Which UDF implementation calculates the length of strings in a Spark DataFrame?
A) df.withColumn("length", spark.udf("len", StringType()))
B) spark.udf.register("stringLength", lambda s: len(s))
C) df.select(length(col("stringColumn")).alias("length"))
D) df.withColumn("length", udf(lambda s: len(s), StringType()))
5. Given this view definition:
df.createOrReplaceTempView("users_vw")
Which approach can be used to query the users_vw view after the session is terminated?
Options:
A) Save the users_vw definition and query using Spark
B) Query the users_vw using Spark
C) Recreate the users_vw and query the data using Spark
D) Persist the users_vw data as a table
질문과 대답:
질문 # 1 정답: D | 질문 # 2 정답: B,E | 질문 # 3 정답: C | 질문 # 4 정답: C | 질문 # 5 정답: D |