Free Microsoft DP-100 Practice Test Questions MCQs

Name: DP-100 Microsoft Practice Test
Brand: MSmcqs

Stop wondering if you're ready. Our Microsoft DP-100 practice test is designed to identify your exact knowledge gaps. Validate your skills with Designing and Implementing a Data Science Solution on Azure Exam questions that mirror the real exam's format and difficulty. Build a personalized study plan based on your free DP-100 exam questions mcqs performance, focusing your effort where it matters most.

Targeted practice like this helps candidates feel significantly more prepared for Designing and Implementing a Data Science Solution on Azure Exam exam day.

2500+ already prepared

Updated On : 17-Jul-2026
50 Questions
Designing and Implementing a Data Science Solution on Azure Exam
4.9/5.0

Page 1 out of 5 Pages

Topic 1, Case Study 1

Overview
You are a data scientist in a company that provides data science for professional sporting events. Models will
be global and local market data to meet the following business goals:
•Understand sentiment of mobile device users at sporting events based on audio from crowd reactions.
•Access a user's tendency to respond to an advertisement.
•Customize styles of ads served on mobile devices.
•Use video to detect penalty events.
Current environment
Requirements
• Media used for penalty event detection will be provided by consumer devices. Media may include images
and videos captured during the sporting event and snared using social media. The images and videos will have
varying sizes and formats.
• The data available for model building comprises of seven years of sporting event media. The sporting event
media includes: recorded videos, transcripts of radio commentary, and logs from related social media feeds
feeds captured during the sporting events.
•Crowd sentiment will include audio recordings submitted by event attendees in both mono and stereo
Formats.
Advertisements
• Ad response models must be trained at the beginning of each event and applied during the sporting event.
• Market segmentation nxxlels must optimize for similar ad resporr.r history.
• Sampling must guarantee mutual and collective exclusivity local and global segmentation models that share
the same features.
• Local market segmentation models will be applied before determining a user’s propensity to respond to an
advertisement.
• Data scientists must be able to detect model degradation and decay.
• Ad response models must support non linear boundaries features.
• The ad propensity model uses a cut threshold is 0.45 and retrains occur if weighted Kappa deviates from 0.1 +/-5%.
• The ad propensity model uses cost factors shown in the following diagram:

Penalty detection and sentiment
Findings
•Data scientists must build an intelligent solution by using multiple machine learning models for penalty event
detection.
•Data scientists must build notebooks in a local environment using automatic feature engineering and model
building in machine learning pipelines.
•Notebooks must be deployed to retrain by using Spark instances with dynamic worker allocation
•Notebooks must execute with the same code on new Spark instances to recode only the source of the data.
•Global penalty detection models must be trained by using dynamic runtime graph computation during
training.
•Local penalty detection models must be written by using BrainScript.
• Experiments for local crowd sentiment models must combine local penalty detection data.
• Crowd sentiment models must identify known sounds such as cheers and known catch phrases. Individual
crowd sentiment models will detect similar sounds.
• All shared features for local models are continuous variables.
• Shared features must use double precision. Subsequent layers must have aggregate running mean and
standard deviation metrics Available.
segments
During the initial weeks in production, the following was observed:
•Ad response rates declined.
•Drops were not consistent across ad styles.
•The distribution of features across training and production data are not consistent.
Analysis shows that of the 100 numeric features on user location and behavior, the 47 features that come from
location sources are being used as raw features. A suggested experiment to remedy the bias and variance issue
is to engineer 10 linearly uncorrected features.
Penalty detection and sentiment
•Initial data discovery shows a wide range of densities of target states in training data used for crowd
sentiment models.
•All penalty detection models show inference phases using a Stochastic Gradient Descent (SGD) are running
too stow.
•Audio samples show that the length of a catch phrase varies between 25%-47%, depending on region.
•The performance of the global penalty detection models show lower variance but higher bias when comparing
training and validation sets. Before implementing any feature changes, you must confirm the bias and variance
using all training and validation cases.

You are a data scientist creating a linear regression model.

You need to determine how closely the data fits the regression line.

Which metric should you review?

A. Coefficient of determination

B. Recall

C. Precision

D. Mean absolute error

E. Root Mean Square Error

A. Coefficient of determination

Explanation
The question asks for a metric to determine how closely the data fits the regression line. This refers to "goodness of fit," which measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s). While several metrics can evaluate a regression model's performance, only one directly quantifies the proportion of variance explained by the model relative to the total variance.

Correct Option

A. Coefficient of determination
The Coefficient of determination, commonly denoted as R-squared (R²), is the statistical metric that specifically measures how close the data are to the fitted regression line.

It represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). An R² of 100% indicates that all changes in the dependent variable are completely explained by changes in the independent variable(s).

It is the direct answer to determining the "goodness of fit" for a regression model.

Incorrect Option

B. Recall
Recall is a classification metric, not a regression metric. It measures the proportion of actual positives that were correctly identified by the model (True Positives / (True Positives + False Negatives)).

It is used in scenarios like binary classification (e.g., spam detection, disease diagnosis) and is irrelevant for evaluating linear regression fits.

C. Precision
Precision is also a classification metric. It measures the proportion of positive identifications that were actually correct (True Positives / (True Positives + False Positives)).

Like recall, it is used to evaluate classification models and does not apply to the context of a regression line.

D. Mean absolute error
Mean Absolute Error (MAE) is a valid regression metric, but it measures the average magnitude of the errors in a set of predictions, without considering their direction. It calculates the average absolute difference between the predicted and actual values.

While it tells you how wrong the predictions are on average, it does not quantify how well the data fits the line (the proportion of variance explained).

E. Root Mean Square Error
Root Mean Square Error (RMSE) is another standard regression metric. It measures the average magnitude of the error by calculating the square root of the average of squared differences between prediction and actual observation.

RMSE gives a relatively high weight to large errors. However, like MAE, it measures prediction error magnitude, not the proportion of variance explained by the model fit.

Reference
Microsoft Learn: Train and evaluate regression models

You need to record the row count as a metric named row_count that can be returned using the get_metrics method of the Run object after the experiment run completes. Which code should you use?

A. run.upload_file(‘row_count’, ‘./data.csv’)

B. run.log(‘row_count’, rows)

C. run.tag(‘row_count’, rows)

D. run.log_table(‘row_count’, rows)

E. run.log_row(‘row_count’, rows)

B. run.log(‘row_count’, rows)

Explanation
The question requires recording a simple numeric value (the row count) as a metric during an Azure Machine Learning experiment run. The metric must be retrievable after the run using the get_metrics() method. Azure ML provides different logging methods for different data types, and for a single numeric value, the appropriate method is the one that logs a key-value pair to the run's metrics record.

Correct Option

B. run.log('row_count', rows)
The log() function is specifically designed to record a single numeric value as a metric in an Azure ML run.

It creates a key-value pair where 'row_count' is the metric name, and the variable rows (containing the integer value) is the metric value.

After execution, this value can be retrieved using run.get_metrics()['row_count'] and will appear in the Azure ML studio under the run's metrics tab.

Incorrect Option

A. run.upload_file('row_count', './data.csv')
The upload_file() method is used to upload files (like models or datasets) to the run's output storage, not to log metrics.

This would upload the entire CSV file to a location named 'row_count', which is not the intended behavior for tracking a simple row count metric.

Uploaded files are accessible through the "Outputs + logs" section, not through the get_metrics() method.

C. run.tag('row_count', rows)
The tag() method is used to add metadata tags to the run itself for organization and searchability, not for logging experiment metrics.

Tags are typically used for filtering and grouping runs (e.g., 'experiment_type': 'regression'), not for tracking numerical results.

Tags cannot be retrieved using the get_metrics() method.

D. run.log_table('row_count', rows)
The log_table() method is used to log dictionary-like objects or tables where the data has multiple columns and rows.

The rows variable in this context is a single integer (the length of the DataFrame), not a table structure.

Using this for a scalar value would either fail or store it incorrectly as a table with unexpected format.

E. run.log_row('row_count', rows)
The log_row() method is used to log a metric row by row, typically when you want to create a table with multiple columns and append multiple rows over time.

It expects column names and values as key-value pairs (e.g., run.log_row("MyTable", column1=value1, column2=value2)).

Passing a single value rows without a column name specification would result in an error.

Reference
Microsoft Learn: Log metrics in Azure ML experiments

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You train and register a machine learning model.

You plan to deploy the model as a real-time web service. Applications must use key-based authentication to use the model.

You need to deploy the web service.

Solution:

Create an AksWebservice instance.

Set the value of the auth_enabled property to False.

Set the value of the token_auth_enabled property to True.

Deploy the model to the service.

Does the solution meet the goal?

A. Yes

B. No

Explanation
The goal requires deploying a model as a real-time web service with key-based authentication. Key-based authentication in Azure Machine Learning endpoints uses either primary and secondary keys. The solution incorrectly configures authentication properties, which determines whether the deployed service will meet the requirement.

Correct Option

B. No
The solution does not meet the goal because it configures conflicting authentication settings.

Setting auth_enabled=False disables key-based authentication entirely, which directly contradicts the requirement for key-based authentication.

Setting token_auth_enabled=True enables token-based authentication (Azure Active Directory tokens), which is a different authentication method than key-based.

For key-based authentication, you must set auth_enabled=True (which is the default value) and ensure token authentication is disabled.

The correct configuration would be to leave auth_enabled as True (or explicitly set it to True) and set token_auth_enabled=False.

Reference
Microsoft Learn: Authentication for Azure ML web services

A set of CSV files contains sales records. All the CSV files have the same data schema.

Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file in stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:

At the end of each month, a new folder with that month’s sales file is added to the sales folder.

You plan to use the sales data to train a machine learning model based on the following requirements:

You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.

You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.

You must register the minimum number of datasets possible.

You need to register the sales data as a dataset in Azure Machine Learning service workspace.

What should you do?

A. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/ sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Usethis dataset for all experiments

B. Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.

C. Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/ sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.

D. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/ sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.

Explanation
The scenario requires loading all sales data to date while maintaining the ability to filter data by time. The folder structure uses wildcard-compatible patterns with month/year folders. The key requirements are loading all data easily, filtering by specific months, and registering the minimum number of datasets. Understanding Azure ML dataset path patterns and versioning strategies is essential to meet these requirements.

Correct Option

B. Create a tabular dataset that references the datastore and specifies the path 'sales/*/sales.csv', register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
The path pattern 'sales//sales.csv' uses a wildcard () to include all month-year subfolders, automatically incorporating new monthly data as it is added.

This satisfies loading all sales data to date with a single dataset registration, meeting the minimum dataset requirement.

Since the dataset loads all available data, filtering for experiments that require data before a specific month can be done in code after loading the dataframe, using the month tags from folder names or date columns in the data.

Registering with a month tag provides metadata about when the dataset was last updated, though it is not strictly necessary for the filtering requirement.

Incorrect Option

A. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
This approach requires manually updating the dataset definition every month to include the new file path.

Replacing the existing dataset each month loses the ability to reference previous dataset states, making it impossible to reproduce experiments that used earlier data without manual intervention.

This violates the requirement to use data from before a specific previous month, as the overwritten dataset only contains the latest configuration.

C. Create a new tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
This creates a new dataset each month with a month-specific name, resulting in many registered datasets over time.

While this allows precise filtering by selecting the appropriate month-specific dataset, it violates the requirement to register the minimum number of datasets possible.

It also requires manual creation and management of multiple datasets instead of leveraging path patterns.

D. Create a tabular dataset that references the datastore and explicitly specifies each 'sales/mm-yyyy/sales.csv' file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.
This approach requires updating the dataset definition each month to add the new file path and creating a new version.

While versioning preserves history, it still requires manual updates and creates multiple versions, though they share the same name.

The requirement to filter by specific months would require selecting the correct version based on tags, but each version contains all files up to that point, not just a specific month, making month-specific filtering impossible without post-load filtering.

Reference
Microsoft Learn: Create Azure Machine Learning datasets

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You are analyzing a numerical dataset which contains missing values in several columns. You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.

You need to analyze a full dataset to include all values.

Solution: Calculate the column median value and use the median value as the replacement for any missing value in the column.

Does the solution meet the goal?

A. Yes

B. No

Explanation
The goal requires cleaning missing values without affecting the dimensionality of the feature set. Dimensionality refers to the number of features (columns) in the dataset. The solution must include all values in the final dataset. The proposed median imputation addresses missing values but fails to meet the requirement of analyzing a full dataset to include all values.

Correct Option

B. No
The solution does not meet the goal because the question specifies the need to "analyze a full dataset to include all values," which means examining the data before cleaning to understand the pattern of missing values.

Using median imputation without first analyzing the missing data ignores important information about why values are missing and their distribution.

A full analysis should include visualization of missing data patterns, calculating the percentage of missing values per column, and determining if missingness is random or systematic.

The solution jumps directly to imputation without performing the required analysis step, which violates the requirement to analyze the full dataset including missing values.

Reference
Microsoft Learn: Handle missing data in Azure Machine Learning

You are building a regression model tot estimating the number of calls during an event.

You need to determine whether the feature values achieve the conditions to build a Poisson regression model.

Which two conditions must the feature set contain? I ach correct answer presents part of the solution.

NOTE: Each correct selection is worth one point.

A. The label data must be a negative value.

B. The label data can be positive or negative,

C. The label data must be a positive value

D. The label data must be non discrete.

E. The data must be whole numbers.

C. The label data must be a positive value
E. The data must be whole numbers.

Explanation
Poisson regression is specifically designed for modeling count data, which has distinct mathematical requirements. The question asks for conditions that the feature set must contain, but actually focuses on the properties of the label (target) variable since Poisson regression assumptions primarily concern the dependent variable. Understanding these assumptions is critical for correctly applying this regression technique.

Correct Option

C. The label data must be a positive value
Poisson regression models count data, which by definition cannot be negative. Counts represent the number of occurrences of an event (e.g., number of calls, number of customers).

The Poisson distribution is defined only for non-negative integers, and the model uses a log link function that requires the mean to be positive.

If the label contains negative values, Poisson regression cannot be applied, and alternative models like linear regression or data transformation would be necessary.

E. The data must be whole numbers
Poisson regression requires the dependent variable to consist of non-negative integers (whole numbers: 0, 1, 2, 3, ...).

This is because the Poisson distribution models the probability of a given number of events occurring in a fixed interval, which is inherently discrete.

Continuous values or decimals would violate the fundamental assumption of the Poisson distribution and would require different modeling approaches such as Gaussian regression or Gamma regression.

Incorrect Option

A. The label data must be a negative value
This is completely incorrect as Poisson regression cannot handle negative values. The Poisson distribution is defined only for non-negative integers.

Negative counts are impossible in real-world count data scenarios like number of calls, accidents, or customers.

B. The label data can be positive or negative
This is incorrect because Poisson regression cannot accommodate negative values. If the label contains negative values, the model assumptions are violated.

While positive values are acceptable, the presence of any negative values makes Poisson regression invalid.

D. The label data must be non discrete
This is incorrect as Poisson regression specifically requires discrete data (counts). Non-discrete (continuous) data would violate the Poisson distribution assumption.

For continuous positive data, other regression techniques like Gamma regression or log-transformed linear regression would be more appropriate.

Reference
Microsoft Learn: Poisson regression in Azure Machine Learning

You use Azure Machine Learning Studio to build a machine learning experiment.

You need to divide data into two distinct datasets.

Which module should you use?

A. Partition and Sample

B. Assign Data to Clusters

C. Group Data into Bins

D. Test Hypothesis Using t-Test

A. Partition and Sample

Explanation
In Azure Machine Learning Studio (classic) or Azure Machine Learning designer, dividing data into two distinct datasets is a common data preparation task, typically for creating training and testing sets. The question asks for the specific module that performs this operation. Understanding the function of each module helps in selecting the correct one.

Correct Option

A. Partition and Sample
The Partition and Sample module is specifically designed to divide a dataset into two or more distinct subsets based on various strategies.

It supports splitting data using techniques like "Split into partitions" where you can specify a fraction of data for the first partition (e.g., 0.7 for training) and the remaining for the second partition (e.g., 0.3 for testing).

It also allows stratified splitting to maintain class distributions, which is essential for imbalanced datasets.

This module directly fulfills the requirement to create two distinct datasets.

Incorrect Option

B. Assign Data to Clusters
The Assign Data to Clusters module is used after training a clustering model (like K-Means) to assign new data points to existing clusters based on their distances to cluster centroids.

It is not designed for splitting a single dataset into two distinct datasets but rather for classification or assignment tasks based on an already trained model.

C. Group Data into Bins
The Group Data into Bins module (also known as discretization) is used to convert continuous numerical data into categorical bins or intervals (e.g., grouping ages into 0-18, 19-35, etc.).

Its purpose is data transformation, not data splitting. It changes the values within a column rather than creating separate datasets.

D. Test Hypothesis Using t-Test
The Test Hypothesis Using t-Test module is a statistical analysis tool used to determine if there is a significant difference between the means of two groups. It is for analysis and inference, not for data preparation or splitting datasets into two parts.

Reference
Microsoft Learn: Partition and Sample module

You are a data scientist working for a bank and have used Azure ML to train and register a machine learning model that predicts whether a customer is likely to repay a loan.

You want to understand how your model is making selections and must be sure that the model does not violate government regulations such as denying loans based on where an applicant lives.

You need to determine the extent to which each feature in the customer data is influencing predictions.

What should you do?

A. Enable data drift monitoring for the model and its training dataset.

B. Score the model against some test data with known label values and use the results to calculate a confusion matrix.

C. Use the Hyperdrive library to test the model with multiple hyperparameter values.

D. Use the interpretability package to generate an explainer for the model.

E. Add tags to the model registration indicating the names of the features in the training dataset.

D. Use the interpretability package to generate an explainer for the model.

Explanation
The question requires understanding how the model makes predictions and specifically checking if features like geographic location (where an applicant lives) influence predictions, which could violate regulations. This is a model interpretability and explainability problem. Azure Machine Learning provides tools to explain feature importance and model behavior to ensure fairness and regulatory compliance.

Correct Option

D. Use the interpretability package to generate an explainer for the model.
The Azure Machine Learning interpretability package provides tools to explain model predictions by calculating feature importance scores at both global and local levels.

By generating an explainer, you can determine the extent to which each feature (including location-based features) influences predictions, directly addressing the regulatory concern.

It supports various explanation techniques like SHAP (SHapley Additive exPlanations), Mimic Explainer, and Permutation Feature Importance.

The results can show whether protected attributes like geographic location are having an inappropriate impact on loan decisions.

Incorrect Option

A. Enable data drift monitoring for the model and its training dataset.
Data drift monitoring detects when the statistical properties of the input data change over time compared to the training data.

While important for maintaining model performance, it does not explain feature influence or help determine if specific features are improperly affecting predictions.

This addresses model performance degradation, not interpretability or fairness.

B. Score the model against some test data with known label values and use the results to calculate a confusion matrix.
Scoring and confusion matrices evaluate model performance metrics like accuracy, precision, recall, and F1 score.

These metrics measure how well the model predicts but provide no insight into which features drive those predictions.

This approach cannot identify whether location or other protected attributes influence decisions.

C. Use the Hyperdrive library to test the model with multiple hyperparameter values.
Hyperdrive is used for hyperparameter tuning to find the optimal model configuration that maximizes performance.

It optimizes model accuracy but does not provide explanations about feature importance or model behavior.

This addresses model optimization, not interpretability or regulatory compliance checking.

E. Add tags to the model registration indicating the names of the features in the training dataset.
Adding tags to model registration is useful for metadata management, organization, and tracking.

While documenting feature names is good practice, tags themselves do not provide any analysis of feature influence on predictions.

This is metadata storage, not model explanation.

Reference
Microsoft Learn: Model interpretability in Azure Machine Learning

Note: This question is part of a series of questions that present the same scenario.
Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You create a model to forecast weather conditions based on historical data.

You need to create a pipeline that runs a processing script to load data from a datastore and pass the processed data to a machine learning model training script.

Solution: Run the following code:

Does the solution meet the goal?

A. Yes

B. No

Explanation
The goal requires creating a pipeline where a processing script loads data from a datastore and passes the processed data to a training script. The provided code attempts to define a two-step pipeline with process_step and train_step. The key requirement is correct data flow between steps, where the processed data from the first step becomes the input to the second step.

Correct Option

B. No
The solution fails because the data flow between steps is incorrectly configured. In the code, process_step takes data_input as an argument and produces data_output.

However, train_step incorrectly uses data_input as its data source instead of using the data_output from process_step. This means the training script receives unprocessed raw data directly from the datastore, not the processed data.

The inputs=[data_output] parameter in train_step specifies that data_output is an input, but the script arguments still reference data_input.

For the pipeline to work correctly, the train_step arguments should reference data_output instead of data_input, ensuring the processed data flows properly between steps.

Reference
Microsoft Learn: Build Azure Machine Learning pipelines

You are evaluating a completed binary classification machine.

You need to use the precision as the evaluation metric.

Which visualization should you use?

A. scatter plot

B. coefficient of determination

C. Receiver Operating Characteristic CROC) curve

D. Gradient descent

C. Receiver Operating Characteristic CROC) curve

Explanation
Precision is a metric used to evaluate binary classification models. The question asks which visualization should be used when precision is the evaluation metric. However, precision itself is not a visualization but a calculated metric. Among the options, only one visualization is commonly associated with binary classification evaluation and can help understand the trade-off between precision and other metrics.

Correct Option

C. Receiver Operating Characteristic (ROC) curve
The ROC curve is a fundamental visualization tool for binary classification evaluation that plots the True Positive Rate (Sensitivity) against the False Positive Rate (1-Specificity) at various threshold settings.

While the ROC curve does not directly display precision, it is closely related to the overall classification performance. The Area Under the Curve (AUC) derived from the ROC curve is a summary metric of model performance.

Precision can be calculated from the confusion matrix, and there are other curves like Precision-Recall curves that are more directly related to precision, especially for imbalanced datasets.

Among the given options, the ROC curve is the only visualization that is standard for binary classification evaluation.

Incorrect Option

A. scatter plot
Scatter plots are used to visualize the relationship between two continuous variables, showing individual data points as dots in a Cartesian coordinate system.

They are not used for evaluating classification model performance or visualizing metrics like precision.

B. coefficient of determination
The coefficient of determination (R²) is a metric used for regression models, not classification models. It measures the proportion of variance in the dependent variable explained by the independent variables.

This is not a visualization and is irrelevant for binary classification evaluation.

D. Gradient descent
Gradient descent is an optimization algorithm used to minimize the loss function during model training by iteratively updating model parameters.

It is not a visualization or evaluation metric for completed models, but rather a training technique.

Reference
Microsoft Learn: Evaluate binary classification models

Page 1 out of 5 Pages

1 2

Designing and Implementing a Data Science Solution on Azure Exam Practice Exam Questions

These DP-100 practice questions with explanations help candidates learn how to build and deploy data science solutions using Azure. Topics include machine learning models, data preparation, training, and deployment. Each explanation helps learners understand the reasoning behind each answer, making complex concepts easier to grasp. This approach supports both theoretical learning and practical application. By practicing these questions, candidates can improve their data science skills, strengthen their understanding of Azure ML services, and confidently prepare for the certification exam.

DP-100 - Designing and Implementing a Data Science Solution on Azure Official Exam Blueprint and Weight:

1. Design and Prepare a Machine Learning Solution

Official Exam Weight: 20-25%

Subtopics: Identify machine learning workloads and business objectives, select Azure Machine Learning workspace resources, configure compute targets and environments, configure data storage and access, choose appropriate machine learning models and frameworks, configure security and access controls, manage datasets and data assets, implement responsible AI principles, identify model evaluation metrics, design data ingestion strategies, configure experiment tracking and versioning, select development tools including notebooks and SDKs, design reproducible machine learning workflows.

2. Explore Data and Train Models

Official Exam Weight: 35-40%

Subtopics: Prepare and clean data for machine learning, perform exploratory data analysis, engineer and select features, split datasets for training and validation, train regression classification and clustering models, use automated machine learning (AutoML), tune hyperparameters, evaluate model performance, interpret model metrics, use Python SDK and notebooks for model training, create and manage experiments, use MLflow for experiment tracking, identify overfitting and underfitting issues, optimize model accuracy and performance.

3. Prepare a Model for Deployment

Official Exam Weight: 20-25%

Subtopics: Register and manage machine learning models, create inference pipelines, configure deployment environments, package models for deployment, configure real-time and batch endpoints, implement model versioning, create scoring scripts and environment configurations, validate deployed models, implement authentication and authorization for endpoints, optimize deployment configurations, test model endpoints, configure deployment scaling and monitoring, troubleshoot deployment issues.

4. Deploy and Retrain Models

Official Exam Weight: 10-15%

Subtopics: Deploy models to Azure Container Instances (ACI) and Azure Kubernetes Service (AKS), configure online and batch inferencing, monitor model performance and drift, implement continuous integration and continuous deployment (CI/CD) pipelines for machine learning, automate retraining workflows, schedule retraining pipelines, configure endpoint monitoring and alerts, manage model lifecycle operations, optimize inference performance and costs, update and redeploy models.

5. Manage and Monitor Machine Learning Solutions

Official Exam Weight: 10-15%

Subtopics: Monitor compute and storage resources, configure logging and diagnostics, track experiments and model metrics, manage machine learning assets and environments, implement governance and compliance practices, configure role-based access control, monitor model usage and endpoint health, manage quotas and resource utilization, troubleshoot operational issues, maintain machine learning pipelines and workflows, implement auditing and monitoring solutions.

Domain	Title	Exam Weight
1	Design and Prepare a Machine Learning Solution	20-25%
2	Explore Data and Train Models	35-40%
3	Prepare a Model for Deployment	20-25%
4	Deploy and Retrain Models	10-15%
5	Manage and Monitor Machine Learning Solutions	10-15%

What Our Clients Say

Data science on Azure requires mastery of Python, machine learning pipelines, and MLOps. The DP-100 exam challenged me on all fronts. MSmcqs provided practice test that covered experiment tracking, model deployment, and automated ML perfectly. The questions were so accurate that the real exam felt like another practice session. Passed with confidence!
Rachel Chen, Data Scientist | Boston, MA

Free Microsoft DP-100 Practice Test Questions MCQs

2500+ already prepared

Question No 50

Question No 49

Question No 48

Question No 47

Question No 46

Question No 45

Question No 44

Question No 43

Question No 42

Question No 41

Designing and Implementing a Data Science Solution on Azure Exam Practice Exam Questions

DP-100 - Designing and Implementing a Data Science Solution on Azure Official Exam Blueprint and Weight:

1. Design and Prepare a Machine Learning Solution

2. Explore Data and Train Models

3. Prepare a Model for Deployment

4. Deploy and Retrain Models

5. Manage and Monitor Machine Learning Solutions

What Our Clients Say