Exam DP-203: Data Engineering on Microsoft Azure
Exam Number: DP-203 | Length of test: 120 mins |
Exam Name: Data Engineering on Microsoft Azure | Number of questions in the actual exam: 40-60 |
Format: PDF, VPLUS | Passing Score: 700/1000 |
Total Questions: 335 $30 Premium PDF file 2 months updates Last updated: November-2024 |
Total Questions: 335 FREE Premium VPLUS file Last updated: November-2024 |
Download practice test questions
Title | Size | Hits | Download |
---|---|---|---|
Microsof.DP-203.vDec-2024.by.Danie.173q | 15.42 MB | 3 | Download |
Microsof.DP-203.vDec-2024.by.Danie.173q | 15.29 MB | 6 | Download |
Microsof.DP-203.vJul-2024.by.Aien.166q | 22.16 MB | 55 | Download |
Microsof.DP-203.vJul-2024.by.Aien.166q | 16.21 MB | 76 | Download |
Microsof.DP-203.vMay-2024.by.ShinamotoHkato.139q | 17.05 MB | 77 | Download |
Microsof.DP-203.vJan-2024.by.Vutana.134q | 33.97 MB | 69 | Download |
Study guide for Exam DP-203: Data Engineering on Microsoft Azure
Audience profile
As a candidate for this exam, you should have subject matter expertise in integrating, transforming, and consolidating data from various structured, unstructured, and streaming data systems into a suitable schema for building analytics solutions.
As an Azure data engineer, you help stakeholders understand the data through exploration, and build and maintain secure and compliant data processing pipelines by using different tools and techniques. You use various Azure data services and frameworks to store and produce cleansed and enhanced datasets for analysis. This data store can be designed with different architecture patterns based on business requirements, including:
- Modern data warehouse (MDW)
- Big data
- Lakehouse architecture
As an Azure data engineer, you also help to ensure that the operationalization of data pipelines and data stores are high-performing, efficient, organized, and reliable, given a set of business requirements and constraints. You help to identify and troubleshoot operational and data quality issues. You also design, implement, monitor, and optimize data platforms to meet the data pipelines.
As a candidate for this exam, you must have solid knowledge of data processing languages, including:
- SQL
- Python
- Scala
You need to understand parallel processing and data architecture patterns. You should be proficient in using the following to create data processing solutions:
- Azure Data Factory
- Azure Synapse Analytics
- Azure Stream Analytics
- Azure Event Hubs
- Azure Data Lake Storage
- Azure Databricks
Skills at a glance
Design and implement data storage (15–20%)
- Implement a partition strategy
- Design and implement the data exploration layer
Develop data processing (40–45%)
- Ingest and transform data
- Develop a batch processing solution
- Develop a stream processing solution
- Manage batches and pipelines
Secure, monitor, and optimize data storage and data processing (30–35%)
- Implement data security
- Monitor data storage and data processing
- Optimize and troubleshoot data storage and data processing
Some new sample questions:
Question:
You have an Azure subscription that contains an Azure Synapse Analytics account and a Microsoft Purview account.
You create a pipeline named Pipeline1 for data ingestion to a dedicated SQL pool.
You need to generate data lineage from Pipeline1 to Microsoft Purview.
Which two activities generate data lineage? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.
A. Web
B. Copy
C. WebHook
D. Dataflow
E. Validation
Question:
You have an Azure Blob storage account named storage! and an Azure Synapse Analytics serverless SQL pool named Pool! From Pool1., you plan to run ad-hoc queries that target storage!
You need to ensure that you can use shared access signature (SAS) authorization without defining a data source. What should you create first?
A. a stored access policy
B. a server-level credential
C. a managed identity
D. a database scoped credential
Question:
HOTSPOT
You have an Azure Data Lake Storage account that contains one CSV file per hour for January 1, 2020, through January 31, 2023. The files are partitioned by using the following folder structure.
csv/system1/{year}/{month)/{filename).csv
You need to query the files by using an Azure Synapse Analytics serverless SQL pool The solution must return the row count of each file created during the last three months of 2022.
How should you complete the query?
……..
Some new questions:
Q
You have an Azure subscription that contains an Azure Synapse Analytics account. The account is integrated with an Azure Repos repository named Repo1 and contains a pipeline named Pipeline1. Repo1 contains the branches shown in the following table.
From featuredev, you develop and test changes to Pipeline1. You need to publish the changes. What should you do first?
A. From featuredev. create a pull request.
B. From main, create a pull request.
C. Add a Publish_config.json file to the root folder of the collaboration branch.
D. Switch to live mode.
Q
You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1. You have the queries shown in the following table.
You are evaluating whether to enable result set caching for Pool1. Which query results will be cached if result set caching is enabled?
A. Query1 only
B. Query2 only
C. Query 1 and Query2 only
D. Query1 and Query3 only
E. Query 1, Query2, and Query3 only
Q
You use Azure Stream Analytics to receive Twitter data from Azure Event Hubs and to output the data to an Azure Blob storage account
You need to output the count of tweets during the last five minutes every five minutes. Each tweet must only be counted once.
Which windowing function should you use?
A. a five-minute Tumbling window
B. a five-minute Sliding window
C. a five-minute Hopping window that has a one-minute hop
D. a five-minute Session window
Q
You have an Azure subscription that contains the resources shown in the following table.
Diagnostic logs from ADF1 are sent to LA1. ADF1 contains a pipeline named Pipeline that copies data (torn DB1 to Dwl. You need to perform the following actions:
* Create an action group named AG1.
* Configure an alert in ADF1 to use AG1.
In which resource group should you create AG1?
A. RG1
B. RG2
C. RG3
D. RG4
…………..
Some new questions:
Q
You have an Azure Stream Analytics job that read data from an Azure event hub.
You need to evaluate whether the job processes data as quickly as the data arrives or cannot keep up.
Which metric should you review?
A. InputEventLastPunctuationTime
B. Input Sources Receive
C. Late input Events
D. Backlogged input Events
Q
You have an Azure Synapse Analytics dedicated SQL pool.
You plan to create a fact table named Table1 that will contain a clustered columnstore index.
You need to optimize data compression and query performance for Table1.
What is the minimum number of rows that Table1 should contain before you create partitions?
A. 100.000
B. 600,000
C. 1 million
D. 60 million
Q
You have an Azure subscription that contains an Azure Data Factory data pipeline named Pipeline1, a Log Analytics workspace named LA1, and a storage account named account1.
You need to retain pipeline-run data for 90 days. The solution must meet the following requirements:
* The pipeline-run data must be removed automatically after 90 days.
* Ongoing costs must be minimized.
Which two actions should you perform? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.
A. Configure Pipeline1 to send logs to LA1.
B. From the Diagnostic settings (classic) settings of account1. set the retention period to 90 days.
C. Configure Pipeline1 to send logs to account1.
D. From the Data Retention settings of LA1, set the data retention period to 90 days.
Q
HOTSPOT
You have an Azure Synapse Analytics dedicated SQL pool that hosts a database named DB1 You need to ensure that D81 meets the following security requirements:
* When credit card numbers show in applications, only the last four digits must be visible.
* Tax numbers must be visible only to specific users.
What should you use for each requirement?
Q
You have a Microsoft Entra tenant.
The tenant contains an Azure Data Lake Storage Gen2 account named storage! that has two containers named fs1 and fs2. You have a Microsoft Entra group named Oepartment
A. You need to meet the following requirements: * OepartmentA must be able to read, write, and list all the files in fs1. * OepartmentA must be prevented from accessing any files in fs2 * The solution must use the principle of least privilege. Which role should you assign to DepartmentA?
A. Contributor for fsl
B. Storage Blob Data Owner for fsl
C. Storage Blob Data Contributor for storage1
D. Storage Blob Data Contributor for fsl
………..