최신 SnowPro Advanced DSA-C03 무료샘플문제:
1. You have implemented a Python UDTF in Snowflake to train a machine learning model incrementally using incoming data'. The UDTF performs well initially, but as the volume of data processed increases significantly, you observe a noticeable degradation in performance and an increase in query execution time. You suspect that the bottleneck is related to the way the model is being updated and persisted within the UDTF. Which of the following optimization strategies, or combination of strategies, would be MOST effective in addressing this performance issue?
A) Instead of updating the model incrementally within the UDTF for each row, batch the incoming data into larger chunks and perform model updates only on these batches. Use Snowflake's VARIANT data type to store these batches temporarily.
B) Leverage Snowflake's external functions and a cloud-based ML platform (e.g., SageMaker, Vertex A1) to offload the model training process. The UDTF would then only be responsible for data preparation and calling the external function.
C) Use the 'cachetools' library within the UDTF to cache intermediate results and reduce redundant calculations during each function call. Configure the cache with a maximum size and eviction policy appropriate for the data volume.
D) Rewrite the UDTF in Java or Scala, as these languages generally offer better performance compared to Python for computationally intensive tasks. Use the same machine learning libraries that you used with Python.
E) Persist the trained model to a Snowflake stage after each batch update. Use a separate UDF (User-Defined Function) to load the model from the stage before processing new data. This decouples model training from inference.
2. A data science team is evaluating different methods for summarizing lengthy customer support tickets using Snowflake Cortex. The goal is to generate concise summaries that capture the key issues and resolutions. Which of the following approaches is/are appropriate for achieving this goal within Snowflake, considering the need for efficiency, cost-effectiveness, and scalability? (Select all that apply)
A) Developing a Python UDF that leverages a pre-trained summarization model from a library like 'transformers' and deploying it in Snowflake. Managing the model loading and inference within the UDF.
B) Creating a custom summarization model using a transformer-based architecture like BART or T5, training it on a large dataset of support tickets and summaries within Snowflake using Snowpark ML, and then deploying this custom model for generating summaries via a UDF.
C) Calling the Snowflake Cortex 'COMPLETE' endpoint with a detailed prompt that instructs the model to summarize the support ticket, explicitly specifying the desired summary length and format.
D) Employing a SQL-based approach using string manipulation functions and keyword extraction techniques to identify important sentences and concatenate them to form a summary.
E) Using the 'SNOWFLAKE.ML.PREDICT' function with a summarization task-specific model provided by Snowflake Cortex, passing the full ticket text as input to generate a summary.
3. You are tasked with building a data science pipeline in Snowflake to predict customer churn. You have trained a scikit-learn model and want to deploy it using a Python UDTF for real-time predictions. The model expects a specific feature vector format. You've defined a UDTF named 'PREDICT CHURN' that loads the model and makes predictions. However, when you call the UDTF with data from a table, you encounter inconsistent prediction results across different rows, even when the input features seem identical. Which of the following are the most likely reasons for this behavior and how would you address them?
A) The issue is related to the immutability of the Snowflake execution environment for UDTFs. To resolve this, cache the loaded model instance within the UDTF's constructor and reuse it for subsequent predictions. Using a global variable is also acceptable.
B) The UDTF is not partitioning data correctly. Ensure the UDTF utilizes the 'PARTITION BY clause in your SQL query based on a relevant dimension (e.g., 'customer_id') to prevent state inconsistencies across partitions. This will isolate the impact of any statefulness within the function
C) The input feature data types in the table do not match the expected data types by the scikit-learn model. Cast the input columns to the correct data types (e.g., FLOAT, INT) before passing them to the UDTF. Use explicit casting functions like 'TO DOUBLE and INTEGER in your SQL query.
D) The scikit-learn model was not properly serialized and deserialized within the UDTF. Ensure the model is saved using 'joblib' or 'pickle' with appropriate settings for cross-platform compatibility and loaded correctly within the UDTF's 'process' method. Verify serialization/deserialization by testing it independently from Snowflake first.
E) There may be an error in model, where the 'predict method is producing different ouputs for the same inputs. Retraining the model will resolve the issue.
4. You're developing a Python UDTF in Snowflake to perform sentiment analysis on customer reviews. The UDTF uses a pre-trained transformer model from Hugging Face. The code is as follows:
When deploying this UDTF, you encounter a 'ModuleNotFoundError: No module named 'transformers" error. Considering best practices for managing dependencies in Snowflake UDTFs, what is the most effective way to resolve this issue?
A) Upload all the dependencies of Transformers (manually downloaded libraries) to the internal stage.
B) Create a Conda environment containing the 'transformers' library, package it into a zip file, upload it to a Snowflake stage, and specify the stage path in the 'imports' parameter when registering the UDTF.
C) Use the 'snowflake-ml-python' library and its dependency management features to automatically resolve and deploy the 'transformers' dependency.
D) Include the 'transformers' library in the same Python file as the UDTF definition. This is acceptable for smaller libraries.
E) Install the 'transformers' library directly on the Snowflake compute nodes using Snowpark's 'add_packageS method at the session level:
5. You are developing a data transformation pipeline in Python that reads data from Snowflake, performs complex operations using Pandas DataFrames, and writes the transformed data back to Snowflake. You've implemented a function, 'transform data(df)', which processes a Pandas DataFrame. You want to leverage Snowflake's compute resources for the DataFrame operations as much as possible, even for intermediate transformations before loading the final result. Which of the following strategies could you employ to optimize this process, assuming you have a configured Snowflake connection "conn"?
A) Read the entire Snowflake table into a single Pandas DataFrame, apply , and then write the entire transformed DataFrame back to Snowflake.
B) Use Snowpark Python DataFrame API to perform the transformation directly on Snowflake's compute and then load results into the same table. Call 'df_snowpark = session.create_dataframe(df)'.
C) Chunk the Snowflake table into smaller DataFrames using 'fetchmany()' , apply to each chunk, and then append each transformed chunk to a Snowflake table using multiple INSERT statements. Call columns=[col[0] for col in cur.description]))'
D) Create a series of Snowflake UDFs that perform the individual transformations within Snowflake, load the data into Pandas DataFrames, apply UDFs on these DataFrames, and use to upload to Snowflake.
E) Use 'snowflake.connector.pandas_tools.write_pandas(conn, df, table_name, auto_create_table=Truey to write the transformed DataFrame to Snowflake and let Snowflake handle the transformations using SQL.
질문과 대답:
질문 # 1 정답: A,B,E | 질문 # 2 정답: C,E | 질문 # 3 정답: C,D | 질문 # 4 정답: B | 질문 # 5 정답: B |