To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.
The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Some new questions:
Q
You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run your jobs instead of maintaining a long-lived Hadoop cluster yourself. You have a tight timeline and want to keep code changes to a minimum. What should you do?
A. Copy your data to Compute Engine disks. Manage and run your jobs directly on those instances.
B. Move your data to Cloud Storage. Run your jobs on Dataproc.
C. Move your data to BigQuery. Convert your Spark scripts to a SQL-based processing approach.
D. Rewrite your jobs in Apache Beam. Run your jobs in Dataflow.
Q
You work for a farming company. You have one BigQuery table named sensors, which is about 500 MB and contains the list of your 5000 sensors, with columns for id, name, and location. This table is updated every hour. Each sensor generates one metric every 30 seconds along with a timestamp. which you want to store in BigQuery. You want to run an analytical query on the data once a week for monitoring purposes. You also want to minimize costs. What data model should you use?
A. 1. Create a retries column in the sensor? table. 2. Set record type and repeated mode for the metrics column. 3. Use an UPDATE statement every 30 seconds to add new metrics.
B. 1. Create a metrics column in the sensors table. 2. Set RECORD type and REPEATED mode for the metrics column. 3. Use an INSERT statement every 30 seconds to add new metrics.
C. 1. Create a metrics table partitioned by timestamp. 2. Create a sensorld column in the metrics table, that points to the id column in the sensors table. 3. Use an IHSEW statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
D. 1. Create a metrics table partitioned by timestamp. 2. Create a sensor Id column in the metrics table, that points to the _d column in the sensors table. 3. Use an UPDATE statement every 30 seconds to append new metrics to the metrics table. 4. Join the two tables, if needed, when running the analytical query.
Q
Your infrastructure team has set up an interconnect link between Google Cloud and the on-premises network. You are designing a high-throughput streaming pipeline to ingest data in streaming from an Apache Kafka cluster hosted on-premises. You want to store the data in BigQuery, with as minima! latency as possible. What should you do?
A. Use a proxy host in the VPC in Google Cloud connecting to Kafka. Write a Dataflow pipeline, read data from the proxy host, and write the data to BigQuery.
B. Setup a Kafka Connect bridge between Kafka and Pub/Sub. Use a Google-provided Dataflow template to read the data from Pub/Sub, and write the data to BigQuery.
C. Setup a Kafka Connect bridge between Kafka and Pub/Sub. Write a Dataflow pipeline, read the data from Pub/Sub, and write the data to BigQuery.
D. Use Dataflow, write a pipeline that reads the data from Kafka, and writes the data to BigQuery.
Q
You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and SQL. Security policies need to be enforced at the column level. You need a cost-effective solution that can scale into a data mesh. What should you do?
A. 1. Deploy a long-living Dalaproc cluster with Apache Hive and Ranger enabled. 2. Configure Ranger for column level security. 3. Process with Dataproc Spark or Hive SQL.
B. 1. Define a BigLake table. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy lags to columns. 4. Process with the Spark-BigQuery connector or BigQuery SOL.
C. 1. Load the data to BigQuery tables. 2. Create a taxonomy of policy tags in Data Catalog. 3. Add policy tags to columns. 4. Procoss with the Spark-BigQuery connector or BigQuery SQL.
D. 1 Apply an Identity and Access Management (IAM) policy at the file level in Cloud Storage 2. Define a BigQuery external table for SQL processing. 3. Use Dataproc Spark to process the Cloud Storage files.
Q
You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?
A. Use the BigQuery Data Transfer Service to schedule your migration. After the data is populated in BigQuery. use the connection to the Cloud Data Loss Prevention {Cloud DLP} API to de-identify the necessary data.
B. Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming. batch processing, and Cloud DLP Select BigQuery as your data sink.
C. Use Cloud Data Fusion to design your pipeline, use the Cloud DLP plug-in to de-identify data within your pipeline, and then move the data into BigQuery.
D. Set up Datastream to replicate your on-premise data on BigQuery.
Q
You have a Standard Tier Memorystore for Redis instance deployed in a production environment. You need to simulate a Redis instance failover in the most accurate disaster recovery situation, and ensure that the failover has no impact on production data. What should you do?
A. Create a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode.
B. Initiate a manual tailover by using the limited-data-loss data protection mode to the Memorystore for Redis instance in the production environment.
C. Increase one replica to Redis instance in production environment. Initiate a manual failover by using the force-data-loss data protection mode.
D. Create a Standard Tier Memorystore for Redis instance in the development environment. Initiate a manual failover by using the limited-data-loss data protection mode.
………………