최신 SnowPro Advanced DEA-C02 무료샘플문제:
1. You are tasked with implementing a data governance strategy in Snowflake for a large data warehouse. Your objective is to classify sensitive data columns, such as customer phone numbers and email addresses, using tags. You want to define a flexible tagging system that allows different levels of sensitivity (e.g., 'Confidential', 'Restricted') to be applied to various columns. Furthermore, you need to ensure that any data replicated to different regions maintains these classifications. Which of the following statements accurately describe best practices for implementing and maintaining data classification using tags in Snowflake, especially in a multi-region setup? Choose TWO.
A) Tags and tag values must be uniquely defined across all schemas to avoid conflicts and ensure accurate data classification; Snowflake enforces uniqueness implicitly.
B) Create a scheduled task that automatically identifies sensitive data based on regular expressions and applies the appropriate tags. This automates the classification process.
C) When replicating data between regions, the tags are automatically replicated along with the data, provided that replication is configured using database replication or failover groups including the tagging schema.
D) Always grant the ACCOUNTADMIN role to users who need to apply tags. This simplifies the process and ensures they have all necessary privileges.
E) Define tag schemas at the account level and replicate them to all regions. This ensures consistency of tag definitions across the entire organization.
2. A data engineer is using Snowpark Scala to create a UDF that calculates the distance between two geographical coordinates (latitude and longitude) using the Haversine formula'. The function should accept four 'Double' values (latl, lonl , lat2, lon2) and return the distance in kilometers as a 'Double'. The UDF must be named 'haversine distance'. What is the correct Scala code to define and register this UDF with Snowflake, including the import statements required for using Snowpark functions?
A) Option C
B) Option B
C) Option A
D) Option D
E) Option E
3. You have a 'WEB EVENTS' table that stores user activity on a website. It includes columns like 'USER ID, 'EVENT TYPE , EVENT TIMESTAMP, and 'PAGE URL'. You need to create a materialized view that calculates the number of distinct users visiting each page daily. You are also tasked with minimizing the impact on the underlying 'WEB EVENTS' table during materialized view refreshes, as other critical processes rely on it. Which of the following strategies would provide the MOST efficient solution, considering both performance and concurrency?
A) Create a materialized view with a 'REFRESH COMPLETE strategy to ensure full data consistency after each refresh, even though it may lock the underlying table.
B) Create a materialized view and schedule regular, small batch refreshes to minimize lock contention and resource consumption on the 'WEB_EVENTS' table.
C) Create a standard materialized view that calculates the distinct user count per page daily directly from the 'WEB EVENTS table without any special configuration.
D) Create a task that truncates and reloads the materialized view daily. This ensures data consistency and prevents incremental refresh issues.
E) Create a materialized view and configure it to incrementally refresh, leveraging Snowflake's automatic refresh capabilities without any explicit scheduling.
4. A data engineer notices that a daily ETL job loading data into a Snowflake table 'TRANSACTIONS' is consistently taking longer than expected. The table is append-only and partitioned by 'TRANSACTION DATE. The engineer observes high 'Remote Spill' during the load process and suspect that micro-partition pruning isn't working effectively. Which of the following approaches would BEST address the performance issue, assuming you have already considered increasing warehouse size?
A) Enable automatic clustering on the 'TRANSACTION_DATE column of the 'TRANSACTIONS table.
B) Implement data skipping by creating a masking policy on the 'TRANSACTION_DATE column.
C) Partition the data in the source system by 'TRANSACTION DATE' and load data in parallel corresponding to each partition.
D) Re-create the 'TRANSACTIONS' table with a larger virtual warehouse and re-load the entire dataset.
E) Examine the data load process to ensure the data is loaded in 'TRANSACTION_DATE order. If not, sort the data by 'TRANSACTION_DATE before loading.
5. You are designing a system to monitor data access patterns in Snowflake. You want to capture detailed information about all queries executed, including the user, query text, execution time, and any potential data access violations based on security policies. Which of the following approaches, used in combination, would provide the MOST comprehensive and scalable solution for this monitoring requirement? (Select TWO)
A) Enable query tagging and insert custom tags into each SQL statement indicating sensitive data access. Then, query 'QUERY HISTORY filtering on these tags.
B) Create a stored procedure to intercept all SQL commands before execution, log them, and then execute them using 'EXECUTE IMMEDIAT
C) Implement Snowflake's Event Tables and configure them to capture security-related events, such as data access policy violations.
D) Enable the 'QUERY_HISTORY view in the 'ACCOUNT_USAGE' schema and periodically query it using a scheduled task.
E) Configure the 'SNOWFLAKE database's audit logs and stream them to an external security information and event management (SIEM) system.
질문과 대답:
질문 # 1 정답: C,E | 질문 # 2 정답: B | 질문 # 3 정답: E | 질문 # 4 정답: A,E | 질문 # 5 정답: C,E |