Google Certification Exams

Exam Professional Data Engineer

Exam Number: Associate Cloud Engineer	Length of test: 2 hours
Exam Name: Associate Cloud Engineer	Number of questions in the actual exam: +50
Format: PDF, VPLUS	Passing Score: +70%

Total Questions: 377

FREE

Premium VPLUS file

Download practice test questions

Title	Size	Hits	Download
Google.Professional Data Engineer .vJun-2024.by.Enis.155q	941.30 KB	69	Download
Google.Professional Data Engineer .vJun-2024.by.Enis.155q	269.58 KB	68	Download

5 1 vote

Article Rating

2 Comments

Inline Feedbacks

View all comments

Tamy-Vdumps

Author

1 month ago

Some new sample question:
Question:
You are designing a messaging system by using Pub/Sub to process clickstream data with an event-driven consumer app that relies on a push subscription. You need to configure the messaging system that is reliable enough to handle temporary downtime of the consumer app. You also need the messaging system to store the input messages that cannot be consumed by the subscriber. The system needs to retry failed messages gradually, avoiding overloading the consumer app, and store the failed messages after a maximum of 10 retries in a topic. How should you configure the Pub/Sub subscription?
A. Increase the acknowledgement deadline to 10 minutes.
B. Use immediate redelivery as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.
C. Use exponential backoff as the subscription retry policy, and configure dead lettering to the same source topic with maximum delivery attempts set to 10.
D. Use exponential backoff as the subscription retry policy, and configure dead lettering to a different topic with maximum delivery attempts set to 10.

Question:
You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?
A. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to ‘none’ using a Cloud Dataproc job.
B. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
C. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to ‘none’ using a Cloud Dataprep job.
D. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.

Question:
You migrated a data backend for an application that serves 10 PB of historical product data for analytics. Only the last known state for a product, which is about 10 GB of data, needs to be served through an API to the other applications. You need to choose a cost-effective persistent storage solution that can accommodate the analytics requirements and the API performance of up to 1000 queries per second (QPS) with less than 1 second latency. What should you do?
A. 1. Store the historical data in BigQuery for analytics. 2. In a Cloud SQL table, store the last state of the product after every product change. 3. Serve the last state data directly from Cloud SQL to the API.
B. 1. Store the historical data in Cloud SQL for analytics. 2. In a separate table, store the last state of the product after every product change. 3. Serve the last state data directly from Cloud SQL to the API.
C. 1. Store the products as a collection in Firestore with each product having a set of historical changes. 2. Use simple and compound queries for analytics. 3. Serve the last state data directly from Firestore to the API.
D. 1. Store the historical data in BigQuery for analytics. 2. Use a materialized view to precompute the last state of a product. 3. Serve the last state data directly from BigQuery to the API.
……..

Tamy-Vdumps

Author

7 months ago

Some new questions:
Q
You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?
A. Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances.
B. Move your data to Cloud Storage. Run your jobs on Dataproc.
C. Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach.
D. Rewrite your jobs in Apache Beam. Run your jobs in Dataflow.

Q
You work for a farming company. You have one BigQuery table named sensors, which is about 500 MB and contains the list of your 5000 sensors, with columns for id, name, and location. This table is updated every hour. Each sensor generates one metric every 30 seconds along with a timestamp. which you want to store in BigQuery. You want to run an analytical query on the data once a week for monitoring purposes. You also want to minimize costs. What data model should you use?
A. 1. Create a retries column in the sensor? table. 2. Set record type and repeated mode for the metrics column. 3. Use an UPDATE statement every 30 seconds to add new metrics.
B. 1. Create a metrics column in the sensors table. 2. Set RECORD type and REPEATED mode for the metrics column. 3. Use an INSERT statement every 30 seconds to add new metrics.
C. 1. Create a metrics table partitioned by timestamp. 2. Create a sensorld column in the metrics table, that points to the id column in the sensors table. 3. Use an IHSEW statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
D. 1. Create a metrics table partitioned by timestamp. 2. Create a sensor Id column in the metrics table, that points to the _d column in the sensors table. 3. Use an UPDATE statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.

Q
Your infrastructure team has set up an interconnect link between Google Cloud and the on-premises network. You are designing a high-throughput streaming pipeline to ingest data in streaming from an Apache Kafka cluster hosted on-premises. You want to store the data in BigQuery, with as minima! latency as possible. What should you do?
A. Use a proxy host in the VPC in Google Cloud connecting to Kafka. Write a Dataflow pipeline, read data from the proxy host, and write the data to BigQuery.
B. Setup a Kafka Connect bridge between Kafka and Pub/Sub. Use a Google-provided Dataflow template to read the data from Pub/Sub, and write the data to BigQuery.
C. Setup a Kafka Connect bridge between Kafka and Pub/Sub. Write a Dataflow pipeline, read the data from Pub/Sub, and write the data to BigQuery.
D. Use Dataflow, write a pipeline that reads the data from Kafka, and writes the data to BigQuery.

Q
You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?
A. 1. Deploy a long-living Dalaproc cluster with Apache Hive and Ranger enabled. 2. Configure Ranger for column level security. 3. Process with Dataproc Spark or Hive SQL.
B. 1. Define a BigLake table. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy lags to columns. 4. Process with the Spark-BigQuery connector or BigQuery SOL.
C. 1. Load the data to BigQuery tables. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy tags to columns. 4. Procoss with the Spark-BigQuery connector or BigQuery SQL.
D. 1 Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage 2. Define a BigQuery external table for SQL processing. 3. Use Dataproc Spark to process the Cloud Storage files.

Q
You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?
A. Use the BigQuery Data Transfer Service to schedule your migration. After the data is populated in BigQuery. use the connection to the Cloud Data Loss Prevention {Cloud DLP} API to de-identify the necessary data.
B. Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming. batch processing, and Cloud DLP Select BigQuery as your data sink.
C. Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.
D. Set up Datastream to replicate your on-premise data on BigQuery.

Q
You have a Standard Tier Memorystore for Redis instance deployed in a production environment. You need to simulate a Redis instance failover in the most accurate disaster recovery situation, and ensure that the failover has no impact on production data. What should you do?
A. Create a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode.
B. Initiate a manual tailover by using the limited-data-loss data protection mode to the Memorystore for Redis instance in the production environment.
C. Increase one replica to Redis instance in production environment. Initiate a manual failover by using the force-data-loss data protection mode.
D. Create a Standard Tier Memorystore for Redis instance in the development environment. Initiate a manual failover by using the limited-data-loss data protection mode.

………………