Pre-Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Data-Engineer-Associate AWS Certified Data Engineer - Associate (DEA-C01) is now Stable and With Pass Result | Test Your Knowledge for Free

Exams4sure Dumps

Data-Engineer-Associate Practice Questions

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update 4 days ago
Total Questions : 289

Dive into our fully updated and stable Data-Engineer-Associate practice test platform, featuring all the latest AWS Certified Data Engineer exam questions added this week. Our preparation tool is more than just a Amazon Web Services study aid; it's a strategic advantage.

Our free AWS Certified Data Engineer practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Data-Engineer-Associate. Use this test to pinpoint which areas you need to focus your study on.

Data-Engineer-Associate PDF

Data-Engineer-Associate PDF (Printable)
$43.75
$124.99

Data-Engineer-Associate Testing Engine

Data-Engineer-Associate PDF (Printable)
$50.75
$144.99

Data-Engineer-Associate PDF + Testing Engine

Data-Engineer-Associate PDF (Printable)
$63.7
$181.99
Question # 51

A company uses Amazon Redshift as its data warehouse service. A data engineer needs to design a physical data model.

The data engineer encounters a de-normalized table that is growing in size. The table does not have a suitable column to use as the distribution key.

Which distribution style should the data engineer use to meet these requirements with the LEAST maintenance overhead?

Options:

A.  

ALL distribution

B.  

EVEN distribution

C.  

AUTO distribution

D.  

KEY distribution

Discussion 0
Question # 52

A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information.

The data engineer must identify and remove duplicate information from the legacy application data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Write a custom extract, transform, and load (ETL) job in Python. Use the DataFramedrop duplicatesf) function by importing the Pandas library to perform data deduplication.

B.  

Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.

C.  

Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

D.  

Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

Discussion 0
Question # 53

A company uploads .csv files to an Amazon S3 bucket. The company ' s data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.

An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.

If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.

Which solution will meet these requirements?

Options:

A.  

Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.

B.  

Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.

C.  

Use Apache Spark ' s DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.

D.  

Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.

Discussion 0
Question # 54

A company runs an extract, transform, and load (ETL) job in AWS Glue. The job processes personally identifiable information (PII) data and writes logs to an Amazon CloudWatch Logs log group. A data engineer needs to mask PII data in the CloudWatch Logs log group.

Which solution will meet these requirements?

Options:

A.  

Attach an AWS Glue security configuration to the ETL job.

B.  

Configure a data protection policy. Attach the policy to the CloudWatch log group.

C.  

Run an Amazon Macie sensitive data discovery job.

D.  

Call AWS Glue sensitive data detection APIs in the ETL job.

Discussion 0
Question # 55

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.

B.  

Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.

C.  

Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.

D.  

Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

Discussion 0
Question # 56

A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.

The company wants to receive notifications when a user violates the data access policy. Each notification must include the username of the user who violated the policy.

Which solution will meet these requirements?

Options:

A.  

Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.

B.  

Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.

C.  

Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.

D.  

Use Amazon S3 server access logs to monitor access to the bucket. Forward the access logs to an Amazon CloudWatch log group. Use metric filters on the log group to set up CloudWatch alarms.

Discussion 0
Question # 57

A company wants to use Apache Spark jobs that run on an Amazon EMR cluster to process streaming data. The Spark jobs will transform and store the data in an Amazon S3 bucket. The company will use Amazon Athena to perform analysis.

The company needs to optimize the data format for analytical queries.

Which solutions will meet these requirements with the SHORTEST query times? (Select TWO.)

Options:

A.  

Use Avro format. Use AWS Glue Data Catalog to track schema changes.

B.  

Use ORC format. Use AWS Glue Data Catalog to track schema changes.

C.  

Use Apache Parquet format. Use an external Amazon DynamoDB table to track schema changes.

D.  

Use Apache Parquet format. Use AWS Glue Data Catalog to track schema changes.

E.  

Use ORC format. Store schema definitions in separate files in Amazon S3.

Discussion 0
Question # 58

A company stores details about transactions in an Amazon S3 bucket. The company wants to log all writes to the S3 bucket into another S3 bucket that is in the same AWS Region.

Which solution will meet this requirement with the LEAST operational effort?

Options:

A.  

Configure an S3 Event Notifications rule for all activities on the transactions S3 bucket to invoke an AWS Lambda function. Program the Lambda function to write the event to Amazon Kinesis Data Firehose. Configure Kinesis Data Firehose to write the event to the logs S3 bucket.

B.  

Create a trail of management events in AWS CloudTraiL. Configure the trail to receive data from the transactions S3 bucket. Specify an empty prefix and write-only events. Specify the logs S3 bucket as the destination bucket.

C.  

Configure an S3 Event Notifications rule for all activities on the transactions S3 bucket to invoke an AWS Lambda function. Program the Lambda function to write the events to the logs S3 bucket.

D.  

Create a trail of data events in AWS CloudTraiL. Configure the trail to receive data from the transactions S3 bucket. Specify an empty prefix and write-only events. Specify the logs S3 bucket as the destination bucket.

Discussion 0
Question # 59

A company uses an Amazon S3 bucket to integrate multiple data sources into a central data lake. The company needs to perform multiple transformations and data cleaning processes on the data to make the data accessible to business partners.

The company needs a solution that will give multiple business partners the ability to run SQL queries on the central data lake during normal business hours.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.  

Use a provisioned Amazon EMR cluster after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into Amazon Redshift Serverless.

B.  

Use an AWS Glue Flex job after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into Amazon Redshift Serverless.

C.  

Use an AWS Lambda function after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.

D.  

Use an AWS Glue Flex job after normal business hours to process the previous day’s data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.

Discussion 0
Question # 60

A hotel management company receives daily data files from each of its hotels. The company wants to upload its data to AWS. The company plans to use Amazon Athena to access the files. The company needs to protect the files from accidental deletion. The company will develop an application on its on-premises servers to automatically forward the files to a fully managed AWS ingestion service.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Use AWS DataSync to replicate data from the on-premises servers to Amazon Elastic File System (Amazon EFS). Configure automatic backups in AWS Backup.

B.  

Use the Amazon Kinesis Agent on the on-premises servers to send data to Amazon Data Firehose. Store the data in an Amazon S3 bucket that has versioning enabled.

C.  

Use AWS Glue jobs to ingest data from the on-premises servers into Amazon RDS. Enable automated backups for data protection.

D.  

Use a self-managed Apache Kafka agent on the on-premises servers to stream data to Amazon Managed Streaming for Apache Kafka (Amazon MSK). Store the data in an Amazon S3 bucket with versioning enabled.

Discussion 0
Get Data-Engineer-Associate dumps and pass your exam in 24 hours!

Free Exams Sample Questions