Summer Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Data-Engineer-Associate AWS Certified Data Engineer - Associate (DEA-C01) is now Stable and With Pass Result | Test Your Knowledge for Free

Exams4sure Dumps

Data-Engineer-Associate Practice Questions

AWS Certified Data Engineer - Associate (DEA-C01)

Last Update 4 days ago
Total Questions : 302

Dive into our fully updated and stable Data-Engineer-Associate practice test platform, featuring all the latest AWS Certified Data Engineer exam questions added this week. Our preparation tool is more than just a Amazon Web Services study aid; it's a strategic advantage.

Our free AWS Certified Data Engineer practice questions crafted to reflect the domains and difficulty of the actual exam. The detailed rationales explain the 'why' behind each answer, reinforcing key concepts about Data-Engineer-Associate. Use this test to pinpoint which areas you need to focus your study on.

Data-Engineer-Associate PDF

Data-Engineer-Associate PDF (Printable)
$54.25
$154.99

Data-Engineer-Associate Testing Engine

Data-Engineer-Associate PDF (Printable)
$59.5
$169.99

Data-Engineer-Associate PDF + Testing Engine

Data-Engineer-Associate PDF (Printable)
$74.55
$212.99
Question # 51

A company wants to migrate data from an Amazon RDS for PostgreSQL DB instance in the eu-east-1 Region of an AWS account named Account_

A.  

The company will migrate the data to an Amazon Redshift cluster in the eu-west-1 Region of an AWS account named Account_

B.  

Which solution will give AWS Database Migration Service (AWS DMS) the ability to replicate data between two data stores?

Options:

A.  

Set up an AWS DMS replication instance in Account_B in eu-west-1.

B.  

Set up an AWS DMS replication instance in Account_B in eu-east-1.

C.  

Set up an AWS DMS replication instance in a new AWS account in eu-west-1

D.  

Set up an AWS DMS replication instance in Account_A in eu-east-1.

Discussion 0
Question # 52

A company needs to collect logs for an Amazon RDS for MySQL database and make the logs available for audits. The logs must track each user that modifies data in the database or makes changes to the database instance.

Which solution will meet these requirements?

Options:

A.  

Enable Amazon CloudWatch Logs. Create metric filters to monitor database changes and instance-level changes. Configure automated notification systems to send near real-time alerts for suspicious database operations.

B.  

Configure an Amazon EventBridge rule to monitor database activity. Create an AWS Lambda function to process EventBridge events and store them in Amazon OpenSearch Service.

C.  

Configure AWS CloudTrail to log API calls. Use Amazon CloudWatch Logs for basic monitoring. Use IAM policies to control access to the logs. Set up scheduled reporting for log audits.

D.  

Enable and configure native Amazon RDS database audit logging. Enable Amazon CloudWatch Logs. Configure metric filters and alarms. Configure AWS CloudTrail audit logging.

Discussion 0
Question # 53

A company has three subsidiaries. Each subsidiary uses a different data warehousing solution. The first subsidiary hosts its data warehouse in Amazon Redshift. The second subsidiary uses Teradata Vantage on AWS. The third subsidiary uses Google BigQuery.

The company wants to aggregate all the data into a central Amazon S3 data lake. The company wants to use Apache Iceberg as the table format.

A data engineer needs to build a new pipeline to connect to all the data sources, run transformations by using each source engine, join the data, and write the data to Iceberg.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.  

Use native Amazon Redshift, Teradata, and BigQuery connectors to build the pipeline in AWS Glue. Use native AWS Glue transforms to join the data. Run a Merge operation on the data lake Iceberg table.

B.  

Use the Amazon Athena federated query connectors for Amazon Redshift, Teradata, and BigQuery to build the pipeline in Athena. Write a SQL query to read from all the data sources, join the data, and run a Merge operation on the data lake Iceberg table.

C.  

Use the native Amazon Redshift connector, the Java Database Connectivity (JDBC) connector for Teradata, and the open source Apache Spark BigQuery connector to build the pipeline in Amazon EMR. Write code in PySpark to join the data. Run a Merge operation on the data lake Iceberg table.

D.  

Use the native Amazon Redshift, Teradata, and BigQuery connectors in Amazon Appflow to write data to Amazon S3 and AWS Glue Data Catalog. Use Amazon Athena to join the data. Run a Merge operation on the data lake Iceberg table.

Discussion 0
Question # 54

A data engineer is using an AWS Glue ETL job to remove outdated customer records from a table that contains customer account information. The data engineer is using the following SQL command to remove customers that exist in a table named monthly_accounts_update from the customer accounts table:

MERGE INTO accounts t USING monthly_accounts_update s ON t.customer = s.customer WHEN MATCHED THEN DELETE

What will happen when the data engineer runs the SQL command?

Options:

A.  

All customer records that exist in both the customer accounts table and the monthly_accounts_update table will be deleted from the accounts table.

B.  

Only customer records that are present in both tables will be retained in the customer accounts table.

C.  

The table will be deleted.

D.  

No records will be deleted because the command syntax is not valid in AWS Glue.

Discussion 0
Question # 55

A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) job. The job contains a sort-merge join of two large and equally sized DataFrames.

The job is failing with the following error: No space left on device.

Which solution will resolve the error?

Options:

A.  

Use the AWS Glue Spark shuffle manager.

B.  

Deploy an Amazon Elastic Block Store (Amazon EBS) volume for the job to use.

C.  

Convert the sort-merge join in the job to be a broadcast join.

D.  

Convert the DataFrames to DynamicFrames, and perform a DynamicFrame join in the job.

Discussion 0
Question # 56

A data engineer needs to build a data pipeline to process medical records from 50 hospitals. The pipeline must ingest 5 GB of data from each hospital and remove personally identifiable information (PII). The pipeline must then transform the data and save the data in a central store. The pipeline must automatically retry after transient failures without manual intervention.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.  

Store the data in Amazon S3. Use AWS Glue extract, transform, and load (ETL) jobs to process the data. Use AWS Glue DataBrew to remove the PII. Orchestrate the pipeline by using AWS Step Functions.

B.  

Deploy an Amazon EC2 instance to run a custom Python script to orchestrate the pipeline and remove the PII. Store the data in Amazon RDS. Use AWS Batch to process the data.

C.  

Store the data in Amazon S3. Create an AWS Lambda function to process the data and mask the PII. Configure Amazon EventBridge to orchestrate the pipeline.

D.  

Orchestrate the pipeline by using AWS Batch to remove the PII and transform the data. Store the data in Amazon S3.

Discussion 0
Question # 57

A company needs to implement a workflow to process transactions. Each transaction goes through multiple levels of validation. Each validation level depends on the preceding validation level.

The workflow must either process or reject each transaction within 24 hours. The workflow must run for less than 24 hours total.

Which solution will meet these requirements with the LEAST operational cost?

Options:

A.  

Create a standard workflow in AWS Step Functions. Implement a Wait for Callback pattern to wait for the validation steps to finish.

B.  

Create an express workflow in AWS Step Functions. Implement a Wait for Callback pattern to wait for the validation steps to finish.

C.  

Use AWS Lambda functions to implement the workflow. Use Amazon EventBridge to invoke the validation steps.

D.  

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to implement the workflow.

Discussion 0
Question # 58

A company stores CSV files in an Amazon S3 bucket. A data engineer needs to process the data in the CSV files and store the processed data in a new S3 bucket.

The process needs to rename a column, remove specific columns, ignore the second row of each file, create a new column based on the values of the first row of the data, and filter the results by a numeric value of a column.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.  

Use AWS Glue Python jobs to read and transform the CSV files.

B.  

Use an AWS Glue custom crawler to read and transform the CSV files.

C.  

Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.

D.  

Use AWS Glue DataBrew recipes to read and transform the CSV files.

Discussion 0
Question # 59

A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.

The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team ' s BI cluster.

The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster.

Which solution will meet these requirements?

Options:

A.  

Set up the sales team Bl cluster as a consumer of the ETL cluster by using Redshift data sharing.

B.  

Create materialized views based on the sales team ' s requirements. Grant the sales team direct access to the ETL cluster.

C.  

Create database views based on the sales team ' s requirements. Grant the sales team direct access to the ETL cluster.

D.  

Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week. Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.

Discussion 0
Question # 60

A company aggregates high-frequency sensor telemetry into an Amazon S3 data lake. Each sensor stream emits structured records every hour. The records include metadata such as sensor category, unit ID, operational state, event timestamp, and site location. The data scales up to millions of records each day. The company runs complex queries each day to uncover performance insights specific to sensor categories.

Which solution will meet these requirements with the FASTEST query execution time?

Options:

A.  

Persist the data in Apache ORC format. Partition the data by date. Sort the data by sensor category.

B.  

Persist the data in CSV format. Partition the data by date. Sort the data by operational status.

C.  

Persist the data in Parquet format. Partition the data by sensor category. Sort the data by date.

D.  

Persist the data in CSV format. Partition the data by date. Sort the data by sensor category.

Discussion 0
Get Data-Engineer-Associate dumps and pass your exam in 24 hours!

Free Exams Sample Questions