Pre-Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Professional-Data-Engineer Google Professional Data Engineer Exam is now Stable and With Pass Result | Test Your Knowledge for Free

Exams4sure Dumps

Professional-Data-Engineer Practice Questions

Google Professional Data Engineer Exam

Last Update 1 day ago
Total Questions : 400

Dive into our fully updated and stable Professional-Data-Engineer practice test platform, featuring all the latest Google Cloud Certified exam questions added this week. Our preparation tool is more than just a Google study aid; it's a strategic advantage.

Our free Google Cloud Certified practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Professional-Data-Engineer. Use this test to pinpoint which areas you need to focus your study on.

Professional-Data-Engineer PDF

Professional-Data-Engineer PDF (Printable)
$43.75
$124.99

Professional-Data-Engineer Testing Engine

Professional-Data-Engineer PDF (Printable)
$50.75
$144.99

Professional-Data-Engineer PDF + Testing Engine

Professional-Data-Engineer PDF (Printable)
$63.7
$181.99
Question # 11

Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error:

# Syntax error : Expected end of statement but got “-“ at [4:11]

SELECT age

FROM

bigquery-public-data.noaa_gsod.gsod

WHERE

age != 99

AND_TABLE_SUFFIX = ‘1929’

ORDER BY

age DESC

Which table name will make the SQL statement work correctly?

Options:

A.  

‘bigquery-public-data.noaa_gsod.gsod‘

B.  

bigquery-public-data.noaa_gsod.gsod*

C.  

‘bigquery-public-data.noaa_gsod.gsod’*

D.  

‘bigquery-public-data.noaa_gsod.gsod*`

Discussion 0
Question # 12

Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?

Options:

A.  

Create a Google Cloud Dataflow job to process the data.

B.  

Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.

C.  

Create a Hadoop cluster on Google Compute Engine that uses persistent disks.

D.  

Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.

E.  

Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

Discussion 0
Question # 13

You are building a model to make clothing recommendations. You know a user’s fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

Options:

A.  

Continuously retrain the model on just the new data.

B.  

Continuously retrain the model on a combination of existing data and the new data.

C.  

Train on the existing data while using the new data as your test set.

D.  

Train on the new data while using the existing data as your test set.

Discussion 0
Question # 14

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

Options:

A.  

Disable caching by editing the report settings.

B.  

Disable caching in BigQuery by editing table details.

C.  

Refresh your browser tab showing the visualizations.

D.  

Clear your browser history for the past hour then reload the tab showing the virtualizations.

Discussion 0
Question # 15

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.

Which Google database service should you use?

Options:

A.  

Cloud SQL

B.  

BigQuery

C.  

Cloud Bigtable

D.  

Cloud Datastore

Discussion 0
Question # 16

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.

Which approach should you take?

Options:

A.  

Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.

B.  

Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.

C.  

Use the NOW () function in BigQuery to record the event’s time.

D.  

Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Discussion 0
Question # 17

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use? (Choose three.)

Options:

A.  

Supervised learning to determine which transactions are most likely to be fraudulent.

B.  

Unsupervised learning to determine which transactions are most likely to be fraudulent.

C.  

Clustering to divide the transactions into N categories based on feature similarity.

D.  

Supervised learning to predict the location of a transaction.

E.  

Reinforcement learning to predict the location of a transaction.

F.  

Unsupervised learning to predict the location of a transaction.

Discussion 0
Question # 18

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patientrecords. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

Options:

A.  

Add capacity (memory and disk space) to the database server by the order of 200.

B.  

Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.

C.  

Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.

D.  

Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.

Discussion 0
Question # 19

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

Options:

A.  

Include ORDER BY DESK on timestamp column and LIMIT to 1.

B.  

Use GROUP BY on the unique ID column and timestamp column and SUM on the values.

C.  

Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.

D.  

Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Discussion 0
Question # 20

You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?

Options:

A.  

Re-write the application to load accumulated data every 2 minutes.

B.  

Convert the streaming insert code to batch load for individual messages.

C.  

Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.

D.  

Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.

Discussion 0
Get Professional-Data-Engineer dumps and pass your exam in 24 hours!

Free Exams Sample Questions