
[2023] DSA-C02 Exam Dumps, Test Engine Practice Test Questions
Pass DSA-C02 exam [Dec 05, 2023] Updated 67 Questions
NEW QUESTION # 22
Which of the following process best covers all of the following characteristics?
Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns.
Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates,functional dependencies, embedded value dependencies, and performing inter-table analysis.
- A. Data Profiling
- B. Data Visualization
- C. Data Collection
- D. Data Virtualization
Answer: A
Explanation:
Explanation
Data processing and analysis cannot happen without data profiling-reviewing source data for con-tent and quality. As data gets bigger and infrastructure moves to the cloud, data profiling is increasingly important.
What is data profiling?
Data profiling is the process of reviewing source data, understanding structure, content and interrelationships, and identifying potential for data projects.
Data profiling is a crucial part of:
Data warehouse and business intelligence (DW/BI) projects-dataprofiling can uncover data quality issues in data sources, and what needs to be corrected in ETL.
Data conversion and migration projects-data profiling can identify data quality issues, which you can handle in scripts and data integration tools copying data from source to target. It can also un-cover new requirements for the target system.
Source system data quality projects-data profiling can highlight data which suffers from serious or numerous quality issues, and the source of the issues (e.g. user inputs, errors in interfaces, data corruption).
Data profiling involves:
Collecting descriptive statistics like min, max, count and sum.
Collecting data types, length and recurring patterns.
Tagging data with keywords, descriptions or categories.
Performing data quality assessment, risk of performing joins on the data.
Discovering metadata and assessing its accuracy.
Identifying distributions, key candidates, foreign-key candidates, functional dependencies, embedded value dependencies, and performing inter-table analysis.
NEW QUESTION # 23
As Data Scientist looking out to use Reader account, Which ones are the correct considerations about Reader Accounts for Third-Party Access?
- A. Users in a reader account can query data that has been shared with the reader account, but cannot perform any of the DML tasks that are allowed in a full account, such as data loading, insert, update, and similar data manipulation operations.
- B. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
- C. Data sharing is only possible between Snowflake accounts.
- D. Each reader account belongs to the provider account that created it.
Answer: C
Explanation:
Explanation
Data sharing is only supported between Snowflake accounts. As a data provider, you might want to share data with a consumer who does not already have a Snowflake account or is not ready to be-come a licensed Snowflake customer.
To facilitate sharing data with these consumers, you can create reader accounts. Reader accounts (formerly known as "read-only accounts") provide a quick, easy, and cost-effective way to share data without requiring the consumer to become a Snowflake customer.
Each reader account belongs to the provider account that created it. As a provider, you use shares to share databases with reader accounts; however, a reader account can only consume data from the provider account that created it.
So, Data Sharing is possible between Snowflake & Non-snowflake accounts via Reader Account.
NEW QUESTION # 24
Which ones are the type of visualization used for Data exploration in Data Science?
- A. Feature Distribution by Class
- B. Newton AI
- C. 2D-Density Plots
- D. Sand Visualization
- E. Heat Maps
Answer: C,D,E
Explanation:
Explanation
Type of visualization used for exploration:
Correlation heatmap
Class distributions by feature
Two-Dimensional density plots.
All the visualizations are interactive, as is standard for Plotly.
For More details, please refer the below link:
https://towardsdatascience.com/data-exploration-understanding-and-visualization-72657f5eac41
NEW QUESTION # 25
Which type of Machine learning Data Scientist generally used for solving classification and regression problems?
- A. Unsupervised
- B. Supervised
- C. Instructor Learning
- D. Reinforcement Learning
- E. Regression Learning
Answer: B
Explanation:
Explanation
Supervised Learning
Overview:
Supervised learning is a type of machine learning that uses labeled data to train machine learning models. In labeled data, the output is already known. The model just needs to map the inputs to the respective outputs.
Algorithms:
Some of the most popularly used supervised learning algorithms are:
Linear Regression
Logistic Regression
Support Vector Machine
K Nearest Neighbor
Decision Tree
Random Forest
Naive Bayes
Working:
Supervised learning algorithms take labelled inputs and map them to the known outputs, which means you already know the target variable.
Supervised Learning methods need external supervision to train machine learning models. Hence, the name supervised. They need guidance and additional information to return the desired result.
Applications:
Supervised learning algorithms are generally used for solving classification and regression problems.
Few of the top supervised learning applications are weather prediction, sales forecasting, stock price analysis.
NEW QUESTION # 26
Which is the visual depiction of data through the use of graphs, plots, and informational graphics?
- A. Data Mining
- B. Data visualization
- C. Data Virtualization
- D. Data Interpretation
Answer: A
Explanation:
Explanation
Data visualization is the visual depiction of data through the use of graphs, plots, and informational graphics.
Its practitioners use statistics and data science to conveythe meaning behind data in ethical and accurate ways.
NEW QUESTION # 27
Which one is the incorrect option to share data in Snowflake?
- A. a Direct Marketplace, in which you directly share specific database objects (a share) to another account in your region using Snowflake Marketplace.
- B. a Data Exchange, in which you set up and manage a group of accounts and offer a share to that group.
- C. a Direct Share, in which you directly share specific database objects (a share) to anoth-er account in your region.
- D. a Listing, in which you offer a share and additional metadata as a data product to one or more accounts.
Answer: A
Explanation:
Explanation
Options for Sharing in Snowflake
You can share data in Snowflake using one of the following options:
a Listing, in which you offer a share and additional metadata as a data product to one or more ac-counts,
a Direct Share, in which you directly share specific database objects (a share) to another account in your region,
a Data Exchange, in which you set up and manage a group of accounts and offer a share to that group.
NEW QUESTION # 28
Which of the following is a useful tool for gaining insights into the relationship between features and predictions?
- A. numpy plots
- B. Partial dependence plots(PDP)
- C. sklearn plots
- D. FULL dependence plots (FDP)
Answer: B
Explanation:
Explanation
Partial dependence plots (PDP) is a useful tool for gaining insights into the relationship between features and predictions. It helps us understand how different values of a particular feature impact model's predictions.
NEW QUESTION # 29
Which Python method can be used to Remove duplicates by Data scientist?
- A. duplicates()
- B. remove_duplicates()
- C. clean_duplicates()
- D. drop_duplicates()
Answer: C
Explanation:
Explanation
The drop_duplicates() method removes duplicate rows.
dataframe.drop_duplicates(subset, keep, inplace, ignore_index)
Remove duplicate rows from the DataFrame:
1.import pandas as pd
2.data = {
3."name": ["Peter", "Mary", "John", "Mary"],
4."age": [50, 40, 30, 40],
5."qualified": [True, False, False, False]
6.}
7.
8.df = pd.DataFrame(data)
9.newdf = df.drop_duplicates()
NEW QUESTION # 30
Mark the Incorrect understanding of Data Scientist about Streams?
- A. Streams can track changes in materialized views.
- B. Streams do not support repeatable read isolation.
- C. Streams itself does not contain any table data.
- D. Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views.
Answer: A,B
Explanation:
Explanation
Streams on views support both local views and views shared using Snowflake Secure Data Sharing, including secure views. Currently, streams cannot track changes in materialized views.
stream itself does not contain any table data. A stream only stores an offset for the source object and returns CDC records by leveraging the versioning history for the source object. When the first stream for a table is created, several hidden columns are added to the source table and begin storing change tracking metadata.
These columns consume a small amount of storage. The CDC records returned when querying a stream rely on a combination of the offset stored in the stream and the change tracking metadata stored in the table. Note that for streams on views, change tracking must be enabled explicitly for the view and underlying tables to add the hidden columns to these tables.
Streams support repeatable read isolation. In repeatable read mode, multiple SQL statements within a transaction see the same set of records in a stream. This differs from the read committed mode supported for tables, in which statements see any changes made by previous statements executed within the same transaction, even though those changes are not yet committed.
The delta records returned by streams in a transaction is the range from the current position of the stream until the transaction start time. The stream position advances to the transaction start time if the transaction commits; otherwise it stays at the same position.
NEW QUESTION # 31
Which one is not Types of Feature Scaling?
- A. Standard Scaling
- B. Economy Scaling
- C. Min-Max Scaling
- D. Robust Scaling
Answer: C
Explanation:
ExplanationFeature Scaling
Feature Scaling is the process of transforming the features so that they have a similar scale. This is important in machine learning because the scale of the features can affect the performance of the model.
Types of Feature Scaling:
Min-Max Scaling: Rescaling the features to a specific range, such as between 0 and 1, by subtracting the minimum value and dividing by the range.
Standard Scaling: Rescaling the features to have a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation.
Robust Scaling: Rescaling the features to be robust to outliers by dividing them by the interquartile range.
Benefits of Feature Scaling:
Improves Model Performance: By transforming the features to have a similar scale, the model can learn from all features equally and avoid being dominated by a few large features.
Increases Model Robustness: By transforming the features to be robust to outliers, the model can become more robust to anomalies.
Improves Computational Efficiency: Many machine learning algorithms, such as k-nearest neighbors, are sensitive to the scale of the features and perform better with scaled features.
Improves Model Interpretability: By transforming the features to have a similar scale, it can be easier to understand the model's predictions.
NEW QUESTION # 32
Consider a data frame df with columns ['A', 'B', 'C', 'D'] and rows ['r1', 'r2', 'r3']. What does the ex-pression df[lambda x : x.index.str.endswith('3')] do?
- A. Returns the row name r3
- B. Results in Error
- C. Returns the third column
- D. Filters the row labelled r3
Answer: D
Explanation:
Explanation
It will Filters the row labelled r3.
NEW QUESTION # 33
Which ones are the correct rules while using a data science model created via External function in Snowflake?
- A. An external function can appear in any clause of a SQL statement in which other types of UDF can appear.
- B. External functions return a value. The returned value can be a compound value, such as a VARIANT that contains JSON.
- C. External functions can accept Model parameters.
- D. External functions can be overloaded.
Answer: A,B,C,D
Explanation:
Explanation
From the perspective of a user running a SQL statement, an external function behaves like any other UDF .
External functions follow these rules:
External functions return a value.
External functions can accept parameters.
An external function can appear in any clause of a SQL statement in which other types of UDF can appear. For example:
1.select my_external_function_2(column_1, column_2)
2.from table_1;
1.select col1
2.from table_1
3.where my_external_function_3(col2) < 0;
1.create view view1 (col1) as
2.select my_external_function_5(col1)
3.from table9;
An external function can be part of a more complex expression:
1.select upper(zipcode_to_city_external_function(zipcode))
2.from address_table;
The returned value can be a compound value, such as a VARIANT that contains JSON.
External functions can be overloaded; two different functions can have the same name but different signatures (different numbers or data types of input parameters).
NEW QUESTION # 34
Skewness of Normal distribution is ___________
- A. Negative
- B. Undefined
- C. 0
- D. Positive
Answer: C
Explanation:
Explanation
Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical explanation for mathematical proofs, you can refer to books or websites that speak on the same in detail.
NEW QUESTION # 35
Data Scientist can query, process, and transform data in a which of the following ways using Snowpark Python. [Select 2]
- A. SnowPark currently do not support writing UDTF.
- B. Query and process data with a DataFrame object.
- C. Transform Data using DataIKY tool with SnowPark API.
- D. Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
Answer: A,B
Explanation:
Explanation
Query and process data with a DataFrame object. Refer to Working with DataFrames in Snowpark Python.
Convert custom lambdas and functions to user-defined functions(UDFs) that you can call to process data.
Write a user-defined tabular function (UDTF) that processes data and returns data in a set of rows with one or more columns.
Write a stored procedure that you can call to process data, or automate with a task to build a data pipeline.
NEW QUESTION # 36
In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How much output variable will change?
- A. by 1
- B. by intercept
- C. by its slope
- D. no change
Answer: C
Explanation:
Explanation
What is linear regression?
Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable.
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatoryvariable, and the other is considered to be a dependent variable. For example, a modeler might want to relate the weights of individuals to their heights using a linear regression model.
A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).
For linear regression Y=a+bx+error.
If neglect error then Y=a+bx. If x increases by 1, then Y = a+b(x+1) which implies Y=a+bx+b. So Y increases by its slope.
For linear regression Y=a+bx+error. If neglect error then Y=a+bx. If x increases by 1, then Y = a+b(x+1) which implies Y=a+bx+b. So Y increases by its slope.
NEW QUESTION # 37
......
Snowflake DSA-C02 Real 2023 Braindumps Mock Exam Dumps: https://prep4sure.real4dumps.com/DSA-C02-prep4sure-exam.html

