Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Data-Engineer-Associate AWS Certified Data Engineer - Associate (DEA-C01) is now Stable and With Pass Result | Test Your Knowledge for Free

Exams4sure Dumps

Data-Engineer-Associate Practice Questions

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update 4 days ago
Total Questions : 302

Dive into our fully updated and stable Data-Engineer-Associate practice test platform, featuring all the latest AWS Certified Data Engineer exam questions added this week. Our preparation tool is more than just a Amazon Web Services study aid; it's a strategic advantage.

Our free AWS Certified Data Engineer practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Data-Engineer-Associate. Use this test to pinpoint which areas you need to focus your study on.

Data-Engineer-Associate PDF

Data-Engineer-Associate PDF (Printable)
$54.25
$154.99

Data-Engineer-Associate Testing Engine

Data-Engineer-Associate PDF (Printable)
$59.5
$169.99

Data-Engineer-Associate PDF + Testing Engine

Data-Engineer-Associate PDF (Printable)
$74.55
$212.99
Question # 41

A data engineer is building a new data pipeline that stores metadata in an Amazon DynamoDB table. The data engineer must ensure that all items that are older than a specified age are removed from the DynamoDB table daily.

Which solution will meet this requirement with the LEAST configuration effort?

Options:

A.  

Enable DynamoDB TTL on the DynamoDB table. Adjust the application source code to set the TTL attribute appropriately.

B.  

Create an Amazon EventBridge rule that uses a daily cron expression to trigger an AWS Lambda function to delete items that are older than the specified age.

C.  

Add a lifecycle configuration to the DynamoDB table that deletes items that are older than the specified age.

D.  

Create a DynamoDB stream that has an AWS Lambda function that reacts to data modifications. Configure the Lambda function to delete items that are older than the specified age.

Discussion 0
Question # 42

A company is planning to upgrade its Amazon Elastic Block Store (Amazon EBS) General Purpose SSD storage from gp2 to gp3. The company wants to prevent any interruptions in its Amazon EC2 instances that will cause data loss during the migration to the upgraded storage.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create snapshots of the gp2 volumes. Create new gp3 volumes from the snapshots. Attach the new gp3 volumes to the EC2 instances.

B.  

Create new gp3 volumes. Gradually transfer the data to the new gp3 volumes. When the transfer is complete, mount the new gp3 volumes to the EC2 instances to replace the gp2 volumes.

C.  

Change the volume type of the existing gp2 volumes to gp3. Enter new values for volume size, IOPS, and throughput.

D.  

Use AWS DataSync to create new gp3 volumes. Transfer the data from the original gp2 volumes to the new gp3 volumes.

Discussion 0
Question # 43

A data engineer needs to optimize the performance of a data pipeline that handles retail orders. Data about the orders is ingested daily into an Amazon S3 bucket.

The data engineer runs queries once each week to extract metrics from the orders data based on the order date for multiple date ranges. The data engineer needs an optimization solution that ensures the query performance will not degrade when the volume of data increases.

Options:

A.  

Partition the data based on order date. Use Amazon Athena to query the data.

B.  

Partition the data based on order date. Use Amazon Redshift to query the data.

C.  

Partition the data based on load date. Use Amazon EMR to query the data.

D.  

Partition the data based on load date. Use Amazon Aurora to query the data.

Discussion 0
Question # 44

A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:

Question # 44

Which solution will meet this requirement with the LEAST coding effort?

Options:

A.  

Use AWS Glue DataBrew to read the files. Use the NEST TO ARRAY transformation to create the new column.

B.  

Use AWS Glue DataBrew to read the files. Use the NEST TO MAP transformation to create the new column.

C.  

Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.

D.  

Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

Discussion 0
Question # 45

A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies.

A data engineer wants to cost optimize the company ' s use of Amazon Athena without adding any additional infrastructure costs.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archive storage class after 1 day

B.  

Use the query result reuse feature of Amazon Athena for the SQL queries.

C.  

Add an Amazon ElastiCache cluster between the Bl application and Athena.

D.  

Change the format of the files that are in the dataset to Apache Parquet.

Discussion 0
Question # 46

A company needs a solution to store and query product data that has variable attributes. The solution must support unpredictable and high-volume queries with single-digit millisecond latency, even during sudden traffic spikes. The solution must retrieve items by a primary identifier named Product I

D.  

The solution must allow flexible queries by secondary attributes named Category and Brand.

Which solution will meet these requirements?

Options:

A.  

Use an Amazon DynamoDB table with on-demand capacity to store product data. Store products by primary key. Use global secondary indexes (GSIs) to store secondary attributes.

B.  

Use Amazon Aurora with a Multi-AZ deployment to store product data. Use read replicas. Create indexes for primary and secondary attributes.

C.  

Use an Amazon OpenSearch Serverless cluster with dynamic scaling to store product data. Index product data by primary and secondary attributes.

D.  

Use Amazon ElastiCache (Redis OSS) and Amazon S3 to store product data. Use Amazon Athena to run flexible secondary attribute queries.

Discussion 0
Question # 47

A data engineer is using an AWS Glue ETL job to remove outdated customer records from a table that contains customer account information. The data engineer is using the following SQL command:

MERGE INTO accounts t USING monthly_accounts_update s

ON t.customer = s.customer

WHEN MATCHED THEN DELETE

What will happen when the data engineer runs the SQL command?

Options:

A.  

All customer records that exist in both the customer accounts table and the monthly_accounts_update table will be deleted from the accounts table.

B.  

Only customer records that are present in both tables will be retained in the customer accounts table.

C.  

The monthly_accounts_update table will be deleted.

D.  

No records will be deleted because the command syntax is not valid in AWS Glue.

Discussion 0
Question # 48

A company processes a CSV file that contains millions of transaction records every day. The file is stored in Amazon S3. Each transaction must be validated before updating a database. The company needs a solution that will process the data in parallel. The solution must use error handling that stops the entire process if more than 15% of the records fail validation.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create an AWS Batch job that processes chunks of the file in parallel with a custom error tracking mechanism.

B.  

Use AWS Step Functions Distributed Map state with the ToleratedFailurePercentage field set to 15%.

C.  

Deploy an Amazon EMR cluster with Spark to process the file. Configure a custom failure threshold to 15%.

D.  

Use AWS Lambda with S3 Batch Operations to process the file and track validation failures to be less than 15%.

Discussion 0
Question # 49

A company is building a data lake for a new analytics team. The company is using Amazon S3 for storage and Amazon Athena for query analysis. All data that is in Amazon S3 is in Apache Parquet format.

The company is running a new Oracle database as a source system in the company ' s data center. The company has 70 tables in the Oracle database. All the tables have primary keys. Data can occasionally change in the source system. The company wants to ingest the tables every day into the data lake.

Which solution will meet this requirement with the LEAST effort?

Options:

A.  

Create an Apache Sqoop job in Amazon EMR to read the data from the Oracle database. Configure the Sqoop job to write the data to Amazon S3 in Parquet format.

B.  

Create an AWS Glue connection to the Oracle database. Create an AWS Glue bookmark job to ingest the data incrementally and to write the data to Amazon S3 in Parquet format.

C.  

Create an AWS Database Migration Service (AWS DMS) task for ongoing replication. Set the Oracle database as the source. Set Amazon S3 as the target. Configure the task to write the data in Parquet format.

D.  

Create an Oracle database in Amazon RDS. Use AWS Database Migration Service (AWS DMS) to migrate the on-premises Oracle database to Amazon RDS. Configure triggers on the tables to invoke AWS Lambda functions to write changed records to Amazon S3 in Parquet format.

Discussion 0
Question # 50

A company needs to generate a one-time performance report by joining data that is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3. The company wants to avoid unnecessary data movement and to minimize query execution time.

Which solution will meet these requirements?

Options:

A.  

Capture data from DynamoDB by using DynamoDB Streams. Migrate data from Amazon RDS by using AWS DMS. Export Amazon Redshift data. Store all data in Amazon S3. Use Redshift Spectrum to run queries.

B.  

Set up an AWS Glue ETL pipeline to extract, transform, and centralize data in Amazon S3. Use Amazon Athena to run analytical queries.

C.  

Deploy an Amazon EMR cluster powered by Apache Spark to ingest, process, and merge datasets from multiple sources. Run analytical workloads on the merged data.

D.  

Use Amazon Athena Federated Query to perform one-time joins and analysis across DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.

Discussion 0
Get Data-Engineer-Associate dumps and pass your exam in 24 hours!

Free Exams Sample Questions