
[Mar-2023] Use Real DAS-C01 Dumps - 100% Free DAS-C01 Exam Dumps
DAS-C01 PDF Dumps Exam Questions – Valid DAS-C01 Dumps
NEW QUESTION 73
A banking company is currently using an Amazon Redshift cluster with dense storage (DS) nodes to store sensitive data. An audit found that the cluster is unencrypted. Compliance requirements state that a database with sensitive data must be encrypted through a hardware security module (HSM) with automated key rotation.
Which combination of steps is required to achieve compliance? (Choose two.)
- A. Enable HSM with key rotation through the AWS CLI.
- B. Enable Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) encryption in the HSM.
- C. Modify the cluster with an HSM encryption option and automatic key rotation.
- D. Set up a trusted connection with HSM using a client and server certificate with automatic key rotation.
- E. Create a new HSM-encrypted Amazon Redshift cluster and migrate the data to the new cluster.
Answer: A,C
NEW QUESTION 74
A real estate company has a mission-critical application using Apache HBase in Amazon EMR. Amazon EMR is configured with a single master node. The company has over 5 TB of data stored on an Hadoop Distributed File System (HDFS). The company wants a cost-effective solution to make its HBase data highly available.
Which architectural pattern meets company's requirements?
- A. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view.
Create a primary EMR HBase cluster with multiple master nodes. Create a secondary EMR HBase read- replica cluster in a separate Availability Zone. Point both clusters to the same HBase root directory in the same Amazon S3 bucket. - B. Use Spot Instances for core and task nodes and a Reserved Instance for the EMR master node.
Configure
the EMR cluster with multiple master nodes. Schedule automated snapshots using Amazon EventBridge. - C. Store the data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent view.
Run two separate EMR clusters in two different Availability Zones. Point both clusters to the same HBase root directory in the same Amazon S3 bucket. - D. Store the data on an EMR File System (EMRFS) instead of HDFS. Enable EMRFS consistent view.
Create an EMR HBase cluster with multiple master nodes. Point the HBase root directory to an Amazon S3 bucket.
Answer: C
NEW QUESTION 75
An operations team notices that a few AWS Glue jobs for a given ETL application are failing. The AWS Glue jobs read a large number of small JSON files from an Amazon S3 bucket and write the data to a different S3 bucket in Apache Parquet format with no major transformations. Upon initial investigation, a data engineer notices the following error message in the History tab on the AWS Glue console: "Command Failed with Exit Code 1." Upon further investigation, the data engineer notices that the driver memory profile of the failed jobs crosses the safe threshold of 50% usage quickly and reaches 90-95% soon after. The average memory usage across all executors continues to be less than 4%.
The data engineer also notices the following error while examining the related Amazon CloudWatch Logs.
What should the data engineer do to solve the failure in the MOST cost-effective way?
- A. Modify maximum capacity to increase the total maximum data processing units (DPUs) used.
- B. Change the worker type from Standard to G.2X.
- C. Increase the fetch size setting by using AWS Glue dynamics frame.
- D. Modify the AWS Glue ETL code to use the 'groupFiles': 'inPartition' feature.
Answer: D
Explanation:
https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html#monitor-debug-oom-fix
NEW QUESTION 76
A retail company's data analytics team recently created multiple product sales analysis dashboards for the average selling price per product using Amazon QuickSight. The dashboards were created from .csv files uploaded to Amazon S3. The team is now planning to share the dashboards with the respective external product owners by creating individual users in Amazon QuickSight. For compliance and governance reasons, restricting access is a key requirement. The product owners should view only their respective product analysis in the dashboard reports.
Which approach should the data analytics team take to allow product owners to view only their products in the dashboard?
- A. Separate the data by product and use IAM policies for authorization.
- B. Separate the data by product and use S3 bucket policies for authorization.
- C. Create a manifest file with row-level security.
- D. Create dataset rules with row-level security.
Answer: D
Explanation:
Explanation
https://docs.aws.amazon.com/quicksight/latest/user/restrict-access-to-a-data-set-using-row-level-security.html
NEW QUESTION 77
An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well-functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?
- A. Export the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.
- B. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.
- C. Export the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts.
- D. Unload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function.
Perform the join with AWS Glue ETL scripts.
Answer: B
Explanation:
Explanation
https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html
NEW QUESTION 78
A company is streaming its high-volume billing data (100 MBps) to Amazon Kinesis Data Streams. A data analyst partitioned the data on account_id to ensure that all records belonging to an account go to the same Kinesis shard and order is maintained. While building a custom consumer using the Kinesis Java SDK, the data analyst notices that, sometimes, the messages arrive out of order for account_id. Upon further investigation, the data analyst discovers the messages that are out of order seem to be arriving from different shards for the same account_id and are seen when a stream resize runs.
What is an explanation for this behavior and what is the solution?
- A. The hash key generation process for the records is not working correctly. The data analyst should generate an explicit hash key on the producer side so the records are directed to the appropriate shard accurately.
- B. There are multiple shards in a stream and order needs to be maintained in the shard. The data analyst needs to make sure there is only a single shard in the stream and no stream resize runs.
- C. The consumer is not processing the parent shard completely before processing the child shards after a stream resize. The data analyst should process the parent shard completely first before processing the child shards.
- D. The records are not being received by Kinesis Data Streams in order. The producer should use the PutRecords API call instead of the PutRecord API call with the SequenceNumberForOrdering parameter.
Answer: C
Explanation:
https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-after-resharding.html the parent shards that remain after the reshard could still contain data that you haven't read yet that was added to the stream before the reshard. If you read data from the child shards before having read all data from the parent shards, you could read data for a particular hash key out of the order given by the data records' sequence numbers. Therefore, assuming that the order of the data is important, you should, after a reshard, always continue to read data from the parent shards until it is exhausted. Only then should you begin reading data from the child shards.
NEW QUESTION 79
A company is planning to create a data lake in Amazon S3. The company wants to create tiered storage based on access patterns and cost objectives. The solution must include support for JDBC connections from legacy clients, metadata management that allows federation for access control, and batch-based ETL using PySpark and Scala. Operational management should be limited.
Which combination of components can meet these requirements? (Choose three.)
- A. AWS Glue Data Catalog for metadata management
- B. Amazon Athena for querying data in Amazon S3 using JDBC drivers
- C. Amazon EMR with Apache Spark for ETL
- D. Amazon EMR with Apache Hive, using an Amazon RDS with MySQL-compatible backed metastore
- E. AWS Glue for Scala-based ETL
- F. Amazon EMR with Apache Hive for JDBC clients
Answer: B,C,D
NEW QUESTION 80
A media company has been performing analytics on log data generated by its applications. There has been a recent increase in the number of concurrent analytics jobs running, and the overall performance of existing jobs is decreasing as the number of new jobs is increasing. The partitioned data is stored in Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) and the analytic processing is performed on Amazon EMR clusters using the EMR File System (EMRFS) with consistent view enabled. A data analyst has determined that it is taking longer for the EMR task nodes to list objects in Amazon S3.
Which action would MOST likely increase the performance of accessing log data in Amazon S3?
- A. Use a hash function to create a random string and add that to the beginning of the object prefixes when storing the log data in Amazon S3.
- B. Redeploy the EMR clusters that are running slowly to a different Availability Zone.
- C. Use a lifecycle policy to change the S3 storage class to S3 Standard for the log data.
- D. Increase the read capacity units (RCUs) for the shared Amazon DynamoDB table.
Answer: B
NEW QUESTION 81
A power utility company is deploying thousands of smart meters to obtain real-time updates about power consumption. The company is using Amazon Kinesis Data Streams to collect the data streams from smart meters. The consumer application uses the Kinesis Client Library (KCL) to retrieve the stream data. The company has only one consumer application.
The company observes an average of 1 second of latency from the moment that a record is written to the stream until the record is read by a consumer application. The company must reduce this latency to 500 milliseconds.
Which solution meets these requirements?
- A. Increase the number of shards for the Kinesis data stream.
- B. Reduce the propagation delay by overriding the KCL default settings.
- C. Develop consumers by using Amazon Kinesis Data Firehose.
- D. Use enhanced fan-out in Kinesis Data Streams.
Answer: B
Explanation:
Explanation
The KCL defaults are set to follow the best practice of polling every 1 second. This default results in average propagation delays that are typically below 1 second.
NEW QUESTION 82
A company is sending historical datasets to Amazon S3 for storage. A data engineer at the company wants to make these datasets available for analysis using Amazon Athen a. The engineer also wants to encrypt the Athena query results in an S3 results location by using AWS solutions for encryption. The requirements for encrypting the query results are as follows:
Use custom keys for encryption of the primary dataset query results.
Use generic encryption for all other query results.
Provide an audit trail for the primary dataset queries that shows when the keys were used and by whom.
Which solution meets these requirements?
- A. Use server-side encryption with customer-provided encryption keys (SSE-C) for the primary dataset. Use server-side encryption with S3 managed encryption keys (SSE-S3) for the other datasets.
- B. Use server-side encryption with AWS KMS managed customer master keys (SSE-KMS CMKs) for the primary dataset. Use server-side encryption with S3 managed encryption keys (SSE-S3) for the other datasets.
- C. Use server-side encryption with S3 managed encryption keys (SSE-S3) for the primary dataset. Use SSE-S3 for the other datasets.
- D. Use client-side encryption with AWS Key Management Service (AWS KMS) customer managed keys for the primary dataset. Use S3 client-side encryption with client-side keys for the other datasets.
Answer: C
NEW QUESTION 83
A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall dat a. The amount of data that is ingested into Amazon S3 has increased substantially over time, and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)
- A. Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
- B. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
- C. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data.
- D. Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.
- E. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
Answer: B,E
Explanation:
Reference:
https://aws.amazon.com/blogs/big-data/work-with-partitioned-data-in-aws-glue/
NEW QUESTION 84
A company uses Amazon Redshift for its data warehousing needs. ETL jobs run every night to load data, apply business rules, and create aggregate tables for reporting. The company's data analysis, data science, and business intelligence teams use the data warehouse during regular business hours. The workload management is set to auto, and separate queues exist for each team with the priority set to NORMAL.
Recently, a sudden spike of read queries from the data analysis team has occurred at least twice daily, and queries wait in line for cluster resources. The company needs a solution that enables the data analysis team to avoid query queuing without impacting latency and the query times of other teams.
Which solution meets these requirements?
- A. Use workload management query queue hopping to route the query to the next matching queue.
- B. Increase the query priority to HIGHEST for the data analysis queue.
- C. Create a query monitoring rule to add more cluster capacity for the data analysis queue when queries are waiting for resources.
- D. Configure the data analysis queue to enable concurrency scaling.
Answer: A
NEW QUESTION 85
A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream partitioned on user_id. An AWS Lambda function retrieves the records and validates the content before loading the posts into an Amazon Elasticsearch cluster. The validation process needs to receive the posts for a given user in the order they were received. A data analyst has noticed that, during peak hours, the social media platform posts take more than an hour to appear in the Elasticsearch cluster.
What should the data analyst do reduce this latency?
- A. Migrate the validation process to Amazon Kinesis Data Firehose.
- B. Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer.
- C. Increase the number of shards in the stream.
- D. Configure multiple Lambda functions to process the stream.
Answer: C
NEW QUESTION 86
A financial company hosts a data lake in Amazon S3 and a data warehouse on an Amazon Redshift cluster. The company uses Amazon QuickSight to build dashboards and wants to secure access from its on-premises Active Directory to Amazon QuickSight.
How should the data be secured?
- A. Use an Active Directory connector and single sign-on (SSO) in a corporate network environment.
- B. Place Amazon QuickSight and Amazon Redshift in the security group and use an Amazon S3 endpoint to connect Amazon QuickSight to Amazon S3.
- C. Use a VPC endpoint to connect to Amazon S3 from Amazon QuickSight and an IAM role to authenticate Amazon Redshift.
- D. Establish a secure connection by creating an S3 endpoint to connect Amazon QuickSight and a VPC endpoint to connect to Amazon Redshift.
Answer: A
Explanation:
https://docs.aws.amazon.com/quicksight/latest/user/directory-integration.html
NEW QUESTION 87
A data analyst is using Amazon QuickSight for data visualization across multiple datasets generated by applications. Each application stores files within a separate Amazon S3 bucket. AWS Glue Data Catalog is used as a central catalog across all application data in Amazon S3. A new application stores its data within a separate S3 bucket. After updating the catalog to include the new application data source, the data analyst created a new Amazon QuickSight data source from an Amazon Athena table, but the import into SPICE failed.
How should the data analyst resolve the issue?
- A. Edit the permissions for the new S3 bucket from within the Amazon QuickSight console.
- B. Edit the permissions for the AWS Glue Data Catalog from within the AWS Glue console.
- C. Edit the permissions for the new S3 bucket from within the S3 console.
- D. Edit the permissions for the AWS Glue Data Catalog from within the Amazon QuickSight console.
Answer: A
NEW QUESTION 88
A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.
Station A, which has 10 sensors
Station B, which has five sensors
These weather stations were placed by onsite subject-matter experts.
Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.
Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B.
Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.
How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?
- A. Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream.
- B. Reduce the number of sensors in Station A from 10 to 5 sensors.
- C. Modify the partition key to use the sensor ID instead of the station name.
- D. Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
Answer: C
Explanation:
https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html
"Splitting increases the number of shards in your stream and therefore increases the data capacity of the stream. Because you are charged on a per-shard basis, splitting increases the cost of your stream"
NEW QUESTION 89
A company is building an analytical solution that includes Amazon S3 as data lake storage and Amazon Redshift for data warehousing. The company wants to use Amazon Redshift Spectrum to query the data that is stored in Amazon S3.
Which steps should the company take to improve performance when the company uses Amazon Redshift Spectrum to query the S3 data files? (Select THREE ) Use gzip compression with individual file sizes of 1-5 GB
- A. Keep all files about the same size.
- B. Use a columnar storage file format
- C. Split the data into KB-sized files.
- D. Use file formats that are not splittable
- E. Partition the data based on the most common query predicates
Answer: A,C,E
NEW QUESTION 90
A company's marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements:
* The data size is approximately 32 TB uncompressed.
* There is a low volume of single-row inserts each day.
* There is a high volume of aggregation queries each day.
* Multiple complex joins are performed.
* The queries typically involve a small subset of the columns in a table.
Which storage service will provide the MOST performant solution?
- A. Amazon Neptune
- B. Amazon Elasticsearch
- C. Amazon Redshift
- D. Amazon Aurora MySQL
Answer: C
NEW QUESTION 91
......
Ultimate DAS-C01 Guide to Prepare Free Latest Amazon Practice Tests Dumps: https://testking.practicedump.com/DAS-C01-exam-questions.html