Amazon EMR Migration Approaches

Amazon EMR – A Migration Plan

Amazon Web Services (AWS) offers their Amazon Elastic
MapReduce (EMR) tool for big data processing and
analysis.  The MapReduce software frame
allows vast amounts of data to be processed quickly and cost- effectively.  In addition, EMR securely and reliably
handles a broad set of big data use cases, including log analysis, web
indexing, data transformations (ETL), machine learning, financial analysis,
scientific simulation, and bioinformatics. 
This is accomplished by using open source
tools such as
Apache Spark, Apache Hive, Apache HBase, Apache Flink, and Presto, coupled with
the dynamic scalability of Amazon EC2 and scalable stores of Amazon S3.  Whether you are running a single purpose,
short lived cluster or a long running highly available cluster, Amazon EMR is a
tool that will provide your organization the flexibility you have been looking
for.  Let’s explore further the benefits
that Amazon EMR will provide to your business.

Getting Started – Amazon EMR Migration Approaches

When starting your organization’s journey to migrate your big data platform to the cloud, you must first decide how to approach migration. There are 3 approaches

1. Re-architect your platform to maximize the benefits of the cloud. This approach requires research, planning, experimentation, education, implementation, and deployment. These efforts cost resources and time but generally provide the greatest rate of return as reduced hardware and storage costs, operational maintenance, and most flexibility to meet future business needs.

2. Lift and shift approach takes your existing architecture and completes a straight migration to the cloud. The lift and shift approach is the ideal way of moving workloads from on-premises to the cloud when time is critical and ambiguity is high. In addition, there is less risk and shorter time to market.
3. Hybrid approach is where you blend a lift and shift with re-architecture approach.  This hybrid approach includes the benefit of being able to experiment and gain experience with cloud technologies and paradigms before moving to the cloud.

Although there are pros and cons to each, it is imperative to agree on the migration approach your organization is taking before you move to the next step, prototyping.

Amazon EMR Prototyping

When moving to a new and unfamiliar product or service, there is always a period of learning. Usually, the best way to learn is to prototype and learn from doing, rather than researching alone, to help identify the unknowns early in the process so you can plan for them later. Make prototyping mandatory to challenge assumptions. Common assumptions when working with new products and services include the following:

1. A particular data format is the best data format for my use case.
2. A particular application is more performant than another application for processing a specific workflow.
3. A particular instance type is the most cost-effective way to run a specific workflow.
4. A particular application running on-premises should work identically on cloud.

There are best practices for prototyping and a AWS partner can help you through these to ensure all assumptions are validated to a high degree of certainty.

Choosing a Team

When starting a migration to the cloud, you must carefully choose your project team to research, design, implement, and maintain the new cloud system. We recommend that your team has individuals in the following roles with the understanding that a person can play multiple roles:

1. Project Leader
2. Big data application engineer
3. Infrastructure engineer
4. Security engineer
5. Group of engineers
Getting started with your migration plan will consist of determining your migration approach, prototyping and choosing your team.  Once these critical items are identified your organization will be able to move to the next steps of the migration plan.  These include gathering requirements, cost estimation, migrating the data and ongoing support.

Cloud Rush is a certified AWS partner.  They specialize in cloud assessments, strategy and planning, cloud migration, managed cloud services, as well as disaster recovery.  Our “service that never sleeps” approach take a hands-on human approach to IT.  Let Cloud Rush work with you to start your Amazon EMR migration journey together.

Lets Talk!

Amazon Elastic MapReduce (EMR)

Amazon Web Services – Amazon EMR

Amazon Web Services (AWS) offers their Amazon Elastic
MapReduce (EMR) tool for big data processing and
analysis.  The MapReduce software frame
allows vast amounts of data to be processed quickly and cost- effectively.  In addition, EMR securely and reliably
handles a broad set of big data use cases, including log analysis, web
indexing, data transformations (ETL), machine learning, financial analysis,
scientific simulation, and bioinformatics. 
This is accomplished by using open source
tools such as
Apache Spark, Apache Hive, Apache HBase, Apache Flink, and Presto, coupled with
the dynamic scalability of Amazon EC2 and scalable stores of Amazon S3.  Whether you are running a single purpose,
short lived cluster or a long running highly available cluster, Amazon EMR is a
tool that will provide your organization the flexibility you have been looking
for.  Let’s explore further the benefits
that Amazon EMR will provide to your business.

Amazon Web Services Amazon EMR Benefits

There are many benefits you will reap when you make use of AWS’s Amazon EMR.  Here are the top 5 benefits to using Amazon EMR:

1. Ease of Use – Everybody wants easy and that is what Amazon EMR will provide.  EMR will launch clusters in minutes.  There is will be no need to worry about node provisioning, infrastructure setup, Hadoop configuration, or cluster tuning.  Amazon EMR takes care of these tasks so your team will be able to focus on the analysis. This will allow your teams to collaborate and interactively explore, process and visualize the data in an easy to use format.

2. Low Cost – The cost of Amazon EMR is a low-cost solution.  It is will be a predictable charge.  Amazon EMR can be billed at a per-second rate with a one-minute minimum charge.  For example, you can launch a 10-node EMR cluster with applications such as Apache Spark, and Apache Hive, for as little as $0.15 per hour.

3. Reliable – Amazon EMR will provide the reliability your team will need.  EMR will allow your team to spend less time tuning and monitoring your cluster. EMR is tuned for the cloud, and constantly monitors your cluster — retrying failed tasks and automatically replacing poorly performing instances. EMR provides the latest stable open source software releases, so you don’t have to manage updates and bug fixes, leading to fewer issues and less effort to maintain the environment. With multiple master nodes, clusters are highly available and automatically failover in the event of a node failure.

4. Security – Amazon EMR security is the highest priority.  Security is a shared responsibility that is shared between AWS and your organization.  A security plan will be put into place to ensure your data is secure. 

5. Flexible – You have complete control over your cluster. You have root access to every instance, you can easily install additional applications, and customize every cluster with bootstrap actions. You can also launch EMR clusters with custom Amazon Linux AMIs and reconfigure running clusters live without the need to re-launch the cluster.

AWS’ Amazon EMR software for big data processing and analysis is a must for your AWS strategy.  The framework will allow your developers to create programs that process immense about of data.   As well as provide them ease of use, low cost, reliable, secure and flexible benefits. Let’s talk about how Amazon Web Services Amazon EMR can work with your organization.

Cloud Rush specializes in cloud assessments, strategy and planning, cloud migration, managed cloud services, as well as disaster recovery. Our “service that never sleeps” approach takes a hands-on, human approach to IT. Partnering with best in class solutions, Cloud Rush wants to be your partner with your cloud long and short-term
goals. 

Amazon Athena

Is Amazon Athena right for you?

Amazon Web Solutions (AWS) offers Amazon (AWS) Athena as a service.  Amazon Athena is a cost-effective interactive query service that will make your life easier and save you time and frustration.  This easy to use, server less service will allow you to quickly query your data without having to setup and manage any servers or data warehouses.  Amazon has made it as easy as, point and click.  It allows you to tap into all of your data without the need to setup complex processes to transform and load the data, so there is not ETL.  With that said, let’s explore Athena.

Cost

Amazon Athena allows you to control your cost. This program allows you to pay per query. You can save 30%-90% on your per query cost and get better performance by compressing and partitioning and converting your data into colander formats. Athena queries the data directly in Amazon Simple Storage Services (S3), so there are no additional charges beyond Amazon S3.

Flexibility

Amazon Athena is flexible, powerful and scalable.  Athena uses presto and works with a variety of formats. This is ideal for quick havoc querying but can handle more complex queries as well. In addition, Athena uses Amazon S3 as its underlying data storage making your data highly available and scalable.

Query Time

With Amazon Athena, you don’t have to worry about not having enough computing resources to get fast interactive query performances. It automatically executes queries in parallel, so most results come back in seconds. Depending on the type of query, it can even be faster if you store the data in a colander format. 
 
Now that you understand the benefits, we wanted to demonstrate how easy it is to use this service. There are only 5 basic steps when you are using Athena.

How to Use Amazon Athena

1.    Create an S3 bucket and object

2.    Create a metadata database

3.    Create a schema

4.    Run the Query

5.    Access the History

As you can see now, Amazon Athena is cost effective, flexible and easy to use.  This service will save you time and money.  The next step is to contact us!  We can setup a complimentary consultation to review your Amazon Web Service needs. 

Public Cloud Governance

Public Cloud Governance

With all the economies of scale afforded through cloud adoption, it is essential to understand that only through public cloud governance are costs managed, data and infrastructure secured, and realize the competitive benefits of cloud providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP) and Azure.
For most organizations, cloud adoption spans business units, is siloed, skill levels vary and generally, results in “black-boxed” in conversations. Public cloud governance is not something that can be overlooked or dismissed, without having an impactful result on the business.
You moved to the cloud in part, to reduce your capital expenses, but you could also have operational expenses accruing that are not aligned with the forecast. Cloud adoption does not have to be a zero-sum game, you can actually realize all of the benefits that the cloud has to offer without breaking the bank and losing track of your data.
Public cloud governance is a discipline that the technical as well as the business savvy can gain control of and have a finger on the pulse of your cloud footprint at all times. Governance is not just for the Enterprise; it is incumbent on any company leveraging the cloud to employ some level of governance, or you will suffer setbacks in areas that were not anticipated.

What is Public Cloud Governance?

At Cloud Rush, we view Public Cloud Governance as having 4 pillars;

  • Resource management
    To govern the cloud, you have to know what is deployed at any point in time.
  • Proactive cost management
    It’s not enough to look at your bill. The cloud changes rapidly, and manually keeping up with the pricing matrices can be a tall order. As a result, public cloud governance will provide cost savings and aggregated recommendations.
  • Policy compliance
    Compliance can be summarized as merely a set of rules. These rules are codified in a way that provide uniform governance that is both proactive and reactive.
  • Access and data security
    Public cloud governance must also monitor usage patterns for compliance and security purposes, but also must account for and categorize data you have in cloud. At the end of the day, compliance officers want on-demand compliance reporting.

How do we govern the cloud?

Fortunately, cloud governance is achievable for companies of any size. In order to govern your clouds, you must aggregate all of your machine data for analysis in real time, or near real time.
Splunk defines machine data as, “one of the most underused and undervalued assets of any organization. But some of the most important insights that you can gain—across IT and the business—are hidden in this data: where things went wrong, how to optimize the customer experience, the fingerprints of fraud. All of these insights can be found in the machine data that’s generated by the normal operations of your organization.”
Because of a wide array of SaaS solutions in the marketplace, companies are now able to define a monitoring stack that brings all of the machine data together to provide real insights, sophisticated compliance monitoring and track your costs. Note however, that there is NOT a single, silver bullet present day; your monitoring stack will generally be comprised of 2-4 vendors, depending on your organization’s needs. As you might guess, many of these platforms will have overlap between each other, but they all have their own unique features that fill various voids.

What does a typical monitoring stack look like?

  • Resource Management
    When it comes to resource management, config management (CMDB) there are a few options;
    – Cloudaware (Cloud Rush recommended)
    – Scalr
    – CloudCheckr
    – CloudHealth
  • Cost Management
    Many platforms offer core cost management and have recommendation engines designed to maximize your dollars spent. Some of our favorites are;
    – CloudHealth (Cloud Rush recommended)
    – Cloudaware
    – Cloudability
  • Compliance
    Organizations have varying levels of compliance needs. Make sure you understand your organization’s compliance and reporting needs. This will help inform vendor selection.
    – Divvy Cloud (Cloud Rush recommended)
    – Cloudaware
  • Log Aggregation
    Everything deployed in the cloud emits data. As a result, these logs must be aggregated for analysis, alerting, reporting and dash-boarding. This data provides operational insights that illuminates your infrastructure as if it were sitting in your on-prem data center.
    – Splunk Cloud
    – Scalyr
    – Sumo Logic
    – ELK stack (“roll your own” platform)
  • Conclusion

    In conclusion, we discussed how important public cloud governance is, where it fits into the organization and briefly introduced you to vendors in this space. In this five (5) part series, we’ll be taking a deep dive into the discipline, and along the way, you’ll broaden your knowledge around how we harness all that we do in the cloud.

    About the Author

    Chris Scragg is a principal cloud architect for Cloud Rush, with years of industry experience related to public cloud governance. Chris’ cloud journey began with a pivot to Amazon Web Services, out of legacy data center environments, back in 2011. A serial entrepreneur, Chris continues to maintain a deep focus in AWS, GCP and Azure, with an eye toward helping clients increase their competitiveness through digital transformations.