Spring Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Data-Engineer-Associate AWS Certified Data Engineer - Associate (DEA-C01) is now Stable and With Pass Result | Test Your Knowledge for Free

Data-Engineer-Associate Practice Questions

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update 2 hours ago
Total Questions : 241

Dive into our fully updated and stable Data-Engineer-Associate practice test platform, featuring all the latest AWS Certified Data Engineer exam questions added this week. Our preparation tool is more than just a Amazon Web Services study aid; it's a strategic advantage.

Our free AWS Certified Data Engineer practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Data-Engineer-Associate. Use this test to pinpoint which areas you need to focus your study on.

Data-Engineer-Associate PDF

Data-Engineer-Associate PDF (Printable)
$43.75
$124.99

Data-Engineer-Associate Testing Engine

Data-Engineer-Associate PDF (Printable)
$50.75
$144.99

Data-Engineer-Associate PDF + Testing Engine

Data-Engineer-Associate PDF (Printable)
$63.7
$181.99
Question # 1

A data engineer is launching an Amazon EMR cluster. The data that the data engineer needs to load into the new cluster is currently in an Amazon S3 bucket. The data engineer needs to ensure that data is encrypted both at rest and in transit.

The data that is in the S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The data engineer has an Amazon S3 path that has a Privacy Enhanced Mail (PEM) file.

Which solution will meet these requirements?

Options:

A.  

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Create a second security configuration. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach both security configurations to the cluster.

B.  

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for local disk encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.

C.  

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Use the security configuration during EMR cluster creation.

D.  

Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket. Specify the Amazon S3 path of the PEM file for in-transit encryption. Create the EMR cluster, and attach the security configuration to the cluster.

Discussion 0
Question # 2

A company stores sales data in an Amazon RDS for MySQL database. The company needs to start a reporting process between 6:00

A.  

M. and 6:10

A.  

M. every Monday. The reporting process must generate a CSV file and store the file in an Amazon S3 bucket.

Which combination of steps will meet these requirements with the LEAST operational overhead? (Select TWO.)

Options:

A.  

Create an Amazon EventBridge rule to run every Monday at 6:00

A.  

M.

B.  

Create an Amazon EventBridge Scheduler to run every Monday at 6:00

A.  

M.

C.  

Create and invoke an AWS Batch job that runs a script in an Amazon Elastic Container Service (Amazon ECS) container. Configure the script to generate the report and to save it to the S3 bucket.

D.  

Create and invoke an AWS Glue ETL job to generate the report and to save it to the S3 bucket.

E.  

Create and invoke an Amazon EMR Serverless job to generate the report and to save it to the S3 bucket.

Discussion 0
Question # 3

A data engineer needs to onboard a new data producer into AWS. The data producer needs to migrate data products to AWS.

The data producer maintains many data pipelines that support a business application. Each pipeline must have service accounts and their corresponding credentials. The data engineer must establish a secure connection from the data producer's on-premises data center to AWS. The data engineer must not use the public internet to transfer data from an on-premises data center to AWS.

Which solution will meet these requirements?

Options:

A.  

Instruct the new data producer to create Amazon Machine Images (AMIs) on Amazon Elastic Container Service (Amazon ECS) to store the code base of the application. Create security groups in a public subnet that allow connections only to the on-premises data center.

B.  

Create an AWS Direct Connect connection to the on-premises data center. Store the service account credentials in AWS Secrets manager.

C.  

Create a security group in a public subnet. Configure the security group to allow only connections from the CIDR blocks that correspond to the data producer. Create Amazon S3 buckets than contain presigned URLS that have one-day expiration dates.

D.  

Create an AWS Direct Connect connection to the on-premises data center. Store the application keys in AWS Secrets Manager. Create Amazon S3 buckets that contain resigned URLS that have one-day expiration dates.

Discussion 0
Question # 4

A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. The company merges with a second company. The original company modifies a query that aggregates sales revenue data to join sales tables from both companies.

The sales table for the first company is named Table S1 and contains 10 billion records. The sales table for the second company is named Table S2 and contains 900 million records. The query becomes slow after the modification.

A data engineer must improve the query performance.

Which solutions will meet these requirements? (Select TWO)

Options:

A.  

Use the KEY distribution style for both sales tables. Select a low-cardinality column to use for the join.

B.  

Use the KEY distribution style for both sales tables. Select a high-cardinality column to use for the join.

C.  

Use the EVEN distribution style for Table S1. Use the ALL distribution style for Table S2.

D.  

Use the Amazon Redshift query optimizer to review and select optimizations to implement.

E.  

Use Amazon Redshift Advisor to review and select optimizations to implement.

Discussion 0
Question # 5

A retail company uses an Amazon Redshift data warehouse and an Amazon S3 bucket. The company ingests retail order data into the S3 bucket every day.

The company stores all order data at a single path within the S3 bucket. The data has more than 100 columns. The company ingests the order data from a third-party application that generates more than 30 files in CSV format every day. Each CSV file is between 50 and 70 MB in size.

The company uses Amazon Redshift Spectrum to run queries that select sets of columns. Users aggregate metrics based on daily orders. Recently, users have reported that the performance of the queries has degraded. A data engineer must resolve the performance issues for the queries.

Which combination of steps will meet this requirement with LEAST developmental effort? (Select TWO.)

Options:

A.  

Configure the third-party application to create the files in a columnar format.

B.  

Develop an AWS Glue ETL job to convert the multiple daily CSV files to one file for each day.

C.  

Partition the order data in the S3 bucket based on order date.

D.  

Configure the third-party application to create the files in JSON format.

E.  

Load the JSON data into the Amazon Redshift table in a SUPER type column.

Discussion 0
Question # 6

A company needs to automate data workflows from multiple data sources to run both on schedules and in response to events from Amazon EventBridge. The data sources are Amazon RDS and Amazon S3. The company needs a single data pipeline that can be invoked both by scheduled events and near real-time EventBridge events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create an AWS Glue workflow. Use EventBridge to integrate the events and schedules.

B.  

Create an Amazon Managed Workflow for Apache Airflow (Amazon MWAA) workflow that uses a directed acyclic graph (DAG). Use EventBridge to integrate the events and schedules.

C.  

Create an AWS Step Functions state machine. Integrate the state machine with AWS Glue ETL jobs and EventBridge to orchestrate the pipeline based on events and schedules.

D.  

Create Amazon EMR Serverless jobs that are invoked by AWS Lambda functions. Use EventBridge events and schedules to orchestrate the EMR jobs.

Discussion 0
Question # 7

A company wants to migrate an application and an on-premises Apache Kafka server to AWS. The application processes incremental updates that an on-premises Oracle database sends to the Kafka server. The company wants to use the replatform migration strategy instead of the refactor strategy.

Which solution will meet these requirements with the LEAST management overhead?

Options:

A.  

Amazon Kinesis Data Streams

B.  

Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned cluster

C.  

Amazon Data Firehose

D.  

Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless

Discussion 0
Question # 8

A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column.

Which solution will MOST speed up the Athena query performance?

Options:

A.  

Change the data format from .csvto JSON format. Apply Snappy compression.

B.  

Compress the .csv files by using Snappy compression.

C.  

Change the data format from .csvto Apache Parquet. Apply Snappy compression.

D.  

Compress the .csv files by using gzjg compression.

Discussion 0
Question # 9

A company wants to ingest streaming data into an Amazon Redshift data warehouse from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. A data engineer needs to develop a solution that provides low data access time and that optimizes storage costs.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create an external schema that maps to the MSK cluster. Create a materialized view that references the external schema to consume the streaming data from the MSK topic.

B.  

Develop an AWS Glue streaming extract, transform, and load (ETL) job to process the incoming data from Amazon MSK. Load the data into Amazon S3. Use Amazon Redshift Spectrum to read the data from Amazon S3.

C.  

Create an external schema that maps to the streaming data source. Create a new Amazon Redshift table that references the external schema.

D.  

Create an Amazon S3 bucket. Ingest the data from Amazon MSK. Create an event-driven AWS Lambda function to load the data from the S3 bucket to a new Amazon Redshift table.

Discussion 0
Question # 10

A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data.

The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog.

Which solution will meet these requirements?

Options:

A.  

Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Configure the output destination to a new path in the existing S3 bucket.

B.  

Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Specify a database name for the output.

C.  

Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Specify a database name for the output.

D.  

Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Configure the output destination to a new path in the existing S3 bucket.

Discussion 0
Get Data-Engineer-Associate dumps and pass your exam in 24 hours!

Free Exams Sample Questions