Pre-Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Data-Engineer-Associate AWS Certified Data Engineer - Associate (DEA-C01) is now Stable and With Pass Result | Test Your Knowledge for Free

Exams4sure Dumps

Data-Engineer-Associate Practice Questions

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update 4 days ago
Total Questions : 289

Dive into our fully updated and stable Data-Engineer-Associate practice test platform, featuring all the latest AWS Certified Data Engineer exam questions added this week. Our preparation tool is more than just a Amazon Web Services study aid; it's a strategic advantage.

Our free AWS Certified Data Engineer practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Data-Engineer-Associate. Use this test to pinpoint which areas you need to focus your study on.

Data-Engineer-Associate PDF

Data-Engineer-Associate PDF (Printable)
$43.75
$124.99

Data-Engineer-Associate Testing Engine

Data-Engineer-Associate PDF (Printable)
$50.75
$144.99

Data-Engineer-Associate PDF + Testing Engine

Data-Engineer-Associate PDF (Printable)
$63.7
$181.99
Question # 31

A company receives a data file from a partner each day in an Amazon S3 bucket. The company uses a daily AW5 Glue extract, transform, and load (ETL) pipeline to clean and transform each data file. The output of the ETL pipeline is written to a CSV file named Dairy.csv in a second 53 bucket.

Occasionally, the daily data file is empty or is missing values for required fields. When the file is missing data, the company can use the previous day ' s CSV file.

A data engineer needs to ensure that the previous day ' s data file is overwritten only if the new daily file is complete and valid.

Which solution will meet these requirements with the LEAST effort?

Options:

A.  

Invoke an AWS Lambda function to check the file for missing data and to fill in missing values in required fields.

B.  

Configure the AWS Glue ETL pipeline to use AWS Glue Data Quality rules. Develop rules in Data Quality Definition Language (DQDL) to check for missing values in required files and empty files.

C.  

Use AWS Glue Studio to change the code in the ETL pipeline to fill in any missing values in the required fields with the most common values for each field.

D.  

Run a SQL query in Amazon Athena to read the CSV file and drop missing rows. Copy the corrected CSV file to the second S3 bucket.

Discussion 0
Question # 32

A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.

The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.

Which solutions will meet these requirements? (Choose two.)

Options:

A.  

Create an AWS Glue partition index. Enable partition filtering.

B.  

Bucket the data based on a column that the data have in common in a WHERE clause of the user query

C.  

Use Athena partition projection based on the S3 bucket prefix.

D.  

Transform the data that is in the S3 bucket to Apache Parquet format.

E.  

Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.

Discussion 0
Question # 33

A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3.

Which actions will provide the FASTEST queries? (Choose two.)

Options:

A.  

Use gzip compression to compress individual files to sizes that are between 1 GB and 5 G

B.  

B.  

Use a columnar storage file format.

C.  

Partition the data based on the most common query predicates.

D.  

Split the data into files that are less than 10 K

B.  

E.  

Use file formats that are not

Discussion 0
Question # 34

A retail company stores data from a product lifecycle management (PLM) application in an on-premises MySQL database. The PLM application frequently updates the database when transactions occur.

The company wants to gather insights from the PLM application in near real time. The company wants to integrate the insights with other business datasets and to analyze the combined dataset by using an Amazon Redshift data warehouse.

The company has already established an AWS Direct Connect connection between the on-premises infrastructure and AWS.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.  

Run a scheduled AWS Glue extract, transform, and load (ETL) job to get the MySQL database updates by using a Java Database Connectivity (JDBC) connection. Set Amazon Redshift as the destination for the ETL job.

B.  

Run a full load plus CDC task in AWS Database Migration Service (AWS DMS) to continuously replicate the MySQL database changes. Set Amazon Redshift as the destination for the task.

C.  

Use the Amazon AppFlow SDK to build a custom connector for the MySQL database to continuously replicate the database changes. Set Amazon Redshift as the destination for the connector.

D.  

Run scheduled AWS DataSync tasks to synchronize data from the MySQL database. Set Amazon Redshift as the destination for the tasks.

Discussion 0
Question # 35

A company uses an Amazon QuickSight dashboard to monitor usage of one of the company ' s applications. The company uses AWS Glue jobs to process data for the dashboard. The company stores the data in a single Amazon S3 bucket. The company adds new data every day.

A data engineer discovers that dashboard queries are becoming slower over time. The data engineer determines that the root cause of the slowing queries is long-running AWS Glue jobs.

Which actions should the data engineer take to improve the performance of the AWS Glue jobs? (Choose two.)

Options:

A.  

Partition the data that is in the S3 bucket. Organize the data by year, month, and day.

B.  

Increase the AWS Glue instance size by scaling up the worker type.

C.  

Convert the AWS Glue schema to the DynamicFrame schema class.

D.  

Adjust AWS Glue job scheduling frequency so the jobs run half as many times each day.

E.  

Modify the 1AM role that grants access to AWS glue to grant access to all S3 features.

Discussion 0
Question # 36

A company needs to automate data workflows from multiple data sources to run both on schedules and in response to events from Amazon EventBridge. The data sources are Amazon RDS and Amazon S3. The company needs a single data pipeline that can be invoked both by scheduled events and near real-time EventBridge events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Create an AWS Glue workflow. Use EventBridge to integrate the events and schedules.

B.  

Create an Amazon Managed Workflow for Apache Airflow (Amazon MWAA) workflow that uses a directed acyclic graph (DAG). Use EventBridge to integrate the events and schedules.

C.  

Create an AWS Step Functions state machine. Integrate the state machine with AWS Glue ETL jobs and EventBridge to orchestrate the pipeline based on events and schedules.

D.  

Create Amazon EMR Serverless jobs that are invoked by AWS Lambda functions. Use EventBridge events and schedules to orchestrate the EMR jobs.

Discussion 0
Question # 37

A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.

The data engineer ' s original query is as follows:

SELECT product_name, sum(sales_amount)

FROM sales_data

WHERE year = 2023

GROUP BY product_name

How should the data engineer modify the Athena query to meet these requirements?

Options:

A.  

Replace sum(sales amount) with count(*J for the aggregation.

B.  

Change WHERE year = 2023 to WHERE extractlyear FROM sales data) = 2023.

C.  

Add HAVING sumfsales amount) > 0 after the GROUP BY clause.

D.  

Remove the GROUP BY clause

Discussion 0
Question # 38

An application uses an AWS Lambda function that is configured with managed runtimes. The Lambda function successfully writes logs to the default Amazon CloudWatch Logs log group. A data engineer wants to modify the logging behavior to show only ERROR level logs for application logs and WARN level logs for system logs.

Which solution will meet these requirements?

Options:

A.  

Add additional permissions to the Lambda execution role.

B.  

Set the log level to ERROR in the Lambda function code.

C.  

Configure the Lambda function to use the JSON log format.

D.  

Configure the Lambda function to send logs to a custom log group.

Discussion 0
Question # 39

A company’s data processing pipeline uses AWS Glue jobs and AWS Glue Data Catalog. All AWS Glue jobs must run in a custom VPC inside a private subnet. The company uses a NAT gateway to support outbound connections.

A data engineer needs to use AWS Glue to migrate data from an on-premises PostgreSQL database to Amazon S3. There is no current network connection between AWS and the on-premises environment. However, the data engineer has updated the on-premises database to allow traffic from the custom VP

C.  

Which solution will meet these requirements?

Options:

A.  

Create a JDBC connection in AWS Glue with the database JDBC URL, username, and password.

B.  

Create a Simple Authentication and Security Layer (SASL) connection in AWS Glue to the on-premises database.

C.  

Create a JDBC connection in AWS Glue with a security group that allows TCP traffic to and from itself.

D.  

Create a JDBC connection in AWS Glue that uses a JDBC driver stored in Amazon S3. Retrieve the database URL, username, and password from AWS Secrets Manager.

Discussion 0
Question # 40

A global company currently uses Amazon Redshift to store data and Amazon Quick Suite (previously known as Amazon QuickSight) to generate reports.

A team of business analysts have varying levels of technical expertise. Some analysts lack SQL knowledge. All the analysts need to create new reports frequently. The company wants to use natural program language queries to create dashboards and reports more efficiently.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Use Quick Suite dashboards that have zero-ETL access to Amazon Redshift.

B.  

Enable Amazon Q in Quick Suite. Generate Quick Suite dashboards and reports.

C.  

Integrate Tableau with Amazon Redshift to give Tableau direct access to the data.

D.  

Use Quick Suite dashboards that have federated query access to Amazon Redshift.

Discussion 0
Get Data-Engineer-Associate dumps and pass your exam in 24 hours!

Free Exams Sample Questions