Pre-Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Professional-Data-Engineer Google Professional Data Engineer Exam is now Stable and With Pass Result | Test Your Knowledge for Free

Exams4sure Dumps

Professional-Data-Engineer Practice Questions

Google Professional Data Engineer Exam

Last Update 1 day ago
Total Questions : 400

Dive into our fully updated and stable Professional-Data-Engineer practice test platform, featuring all the latest Google Cloud Certified exam questions added this week. Our preparation tool is more than just a Google study aid; it's a strategic advantage.

Our free Google Cloud Certified practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Professional-Data-Engineer. Use this test to pinpoint which areas you need to focus your study on.

Professional-Data-Engineer PDF

Professional-Data-Engineer PDF (Printable)
$43.75
$124.99

Professional-Data-Engineer Testing Engine

Professional-Data-Engineer PDF (Printable)
$50.75
$144.99

Professional-Data-Engineer PDF + Testing Engine

Professional-Data-Engineer PDF (Printable)
$63.7
$181.99
Question # 31

You have a variety of files in Cloud Storage that your data science team wants to use in their models Currently, users do not have a method to explore, cleanse, and validate the data in Cloud Storage. You are looking for a low code solution that can be used by your data science team to quickly cleanse and explore data within Cloud Storage. What should you do?

Options:

A.  

Load the data into BigQuery and use SQL to transform the data as necessary Provide the data science team access to staging tables to explore the raw data.

B.  

Provide the data science team access to Dataflow to create a pipeline to prepare and validate the raw data and load data into BigQuery for data exploration.

C.  

Provide the data science team access to Dataprep to prepare, validate, and explore the data within Cloud Storage.

D.  

Create an external table in BigQuery and use SQL to transform the data as necessary Provide the data science team access to the external tables to explore the raw data.

Discussion 0
Question # 32

You are administering a BigQuery dataset that uses a customer-managed encryption key (CMEK). You need to share the dataset with a partner organization that does not have access to your CMEK. What should you do?

Options:

A.  

Create an authorized view that contains the CMEK to decrypt the data when accessed.

B.  

Provide the partner organization a copy of your CMEKs to decrypt the data.

C.  

Copy the tables you need to share to a dataset without CMEKs Create an Analytics Hub listing for this dataset.

D.  

Export the tables to parquet files to a Cloud Storage bucket and grant the storageinsights. viewer role on the bucket to the partner organization.

Discussion 0
Question # 33

You need to create a SQL pipeline. The pipeline runs an aggregate SOL transformation on a BigQuery table every two hours and appends the result to another existing BigQuery table. You need to configure the pipeline to retry if errors occur. You want the pipeline to send an email notification after three consecutive failures. What should you do?

Options:

A.  

Create a BigQuery scheduled query to run the SOL transformation with schedule options that repeats every two hours, and enable emailnotifications.

B.  

Use the BigQueryUpsertTableOperator in Cloud Composer, set the retry parameter to three, and set the email_on_failure parameter totrue.

C.  

Use the BigQuerylnsertJobOperator in Cloud Composer, set the retry parameter to three, and set the email_on_failure parameter totrue.

D.  

Create a BigQuery scheduled query to run the SQL transformation with schedule options that repeats every two hours, and enablenotification to Pub/Sub topic. Use Pub/Sub and Cloud Functions to send an email after three tailed executions.

Discussion 0
Question # 34

You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:

Decoupling producer from consumer

Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely

Near real-time SQL query

Maintain at least 2 years of historical data, which will be queried with SQ

Which pipeline should you use to meet these requirements?

Options:

A.  

Create an application that provides an API. Write a tool to poll the API and write data to Cloud Storage as gzipped JSON files.

B.  

Create an application that writes to a Cloud SQL database to store the data. Set up periodic exports of the database to write to Cloud Storage and load into BigQuery.

C.  

Create an application that publishes events to Cloud Pub/Sub, and create Spark jobs on Cloud Dataproc to convert the JSON data to Avro format, stored on HDFS on Persistent Disk.

D.  

Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.

Discussion 0
Question # 35

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

Options:

A.  

Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.

B.  

Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.

C.  

Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.

D.  

Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.

Discussion 0
Question # 36

A TensorFlow machine learning model on Compute Engine virtual machines (n2-standard -32) takes two days to complete framing. The model has custom TensorFlow operations that must run partially on a CPU You want to reduce the training time in a cost-effective manner. What should you do?

Options:

A.  

Change the VM type to n2-highmem-32

B.  

Change the VM type to e2 standard-32

C.  

Train the model using a VM with a GPU hardware accelerator

D.  

Train the model using a VM with a TPU hardware accelerator

Discussion 0
Question # 37

You are designing a stateful data processing pipeline that reads data from a Cloud Storage bucket and writes transformed data to a BigQuery table. The pipeline must be highly available and resilient to zonal failures within the us-central1 region. You need to configure a Dataflow pipeline ensuring minimal disruption during a zonal outage. What should you do?

Options:

A.  

Deploy the Dataflow job to a single zone within us-central1 and configure it to use a regional persistent disk to store its state.

B.  

Launch the Dataflow job with the --region us-central1 parameter.

C.  

Deploy the Dataflow job to a single zone within us-central1 and use a multi-regional Cloud Storage bucket to store its state.

D.  

Launch the Dataflow job with the --zone us-central1-a parameter.

Discussion 0
Question # 38

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query – -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

Options:

A.  

Create a separate table for each I

D.  

B.  

Use the LIMIT keyword to reduce the number of rows returned.

C.  

Recreate the table with a partitioning column and clustering column.

D.  

Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.

Discussion 0
Question # 39

An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

Options:

A.  

Use a standard Dataflow pipeline to store the raw data m BigQuery and then transform the format later when the data is used

B.  

Write a she script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

C.  

Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

D.  

Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

Discussion 0
Question # 40

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Options:

A.  

Subsample your test dataset.

B.  

Subsample your training dataset.

C.  

Increase the number of input features to your model.

D.  

Increase the number of layers in your neural network.

Discussion 0
Get Professional-Data-Engineer dumps and pass your exam in 24 hours!

Free Exams Sample Questions