최신 Databricks Certification Databricks-Certified-Data-Engineer-Professional 무료샘플문제:
1. A large company seeks to implement a near real-time solution involving hundreds of pipelines with parallel updates of many tables with extremely high volume and high velocity data.
Which of the following solutions would you implement to achieve this requirement?
A) Store all tables in a single database to ensure that the Databricks Catalyst Metastore can load balance overall throughput.
B) Use Databricks High Concurrency clusters, which leverage optimized cloud storage connections to maximize data throughput.
C) Partition ingestion tables by a small time duration to allow for many data files to be written in parallel.
D) Isolate Delta Lake tables in their own storage containers to avoid API limits imposed by cloud vendors.
E) Configure Databricks to save all data to attached SSD volumes instead of object storage, increasing file I/O significantly.
2. A junior data engineer is working to implement logic for a Lakehouse table named silver_device_recordings. The source data contains 100 unique fields in a highly nested JSON structure.
The silver_device_recordings table will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the 100 fields are being used in at least one of these applications.
The data engineer is trying to determine the best approach for dealing with schema declaration given the highly-nested structure of the data and the numerous fields.
Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?
A) Because Delta Lake uses Parquet for data storage, data types can be easily evolved by just modifying file footer information in place.
B) The Tungsten encoding used by Databricks is optimized for storing string data; newly-added native support for querying JSON strings means that string types are always most efficient.
C) Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.
D) Schema inference and evolution on .Databricks ensure that inferred types will always accurately match the data types used by downstream systems.
E) Human labor in writing code is the largest cost associated with data engineering workloads; as such, automating table declaration logic should be a priority in all migration workloads.
3. A developer has successfully configured credential for Databricks Repos and cloned a remote Git repository. Hey don not have privileges to make changes to the main branch, which is the only branch currently visible in their workspace.
Use Response to pull changes from the remote Git repository commit and push changes to a branch that appeared as a changes were pulled.
A) Use Repos to create a new branch commit all changes and push changes to the remote Git repertory.
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from
B) Use repos to create a fork of the remote repository commit all changes and make a pull request on the source repository
C) Use repos to merge all difference and make a pull request back to the remote repository.
D) Use Repos to pull changes from the remote Git repository; commit and push changes to a branch that appeared as changes were pulled.
E) Use Repos to merge all differences and make a pull request back to the remote repository.
4. A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.
Which consideration will impact the decisions made by the engineer while migrating this workload?
A) Databricks supports Spark SQL and JDBC; all logic can be directly migrated from the source system without refactoring.
B) Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
C) Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake's upsert functionality.
D) All Delta Lake transactions are ACID compliance against a single table, and Databricks does not enforce foreign key constraints.
E) Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
5. The data engineer is using Spark's MEMORY_ONLY storage level. Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?
A) Size on Disk is> 0
B) The RDD Block Name included the '' annotation signaling failure to cache
C) On Heap Memory Usage is within 75% of off Heap Memory usage
D) Size on Disk is < Size in Memory
E) The number of Cached Partitions> the number of Spark Partitions
질문과 대답:
질문 # 1 정답: B | 질문 # 2 정답: C | 질문 # 3 정답: A | 질문 # 4 정답: D | 질문 # 5 정답: A |