Course

Leveraging Unstructured Data with Cloud Dataproc on Google Cloud em Português Brasileiro

Google Cloud

Explore the intensive one-week course, Leveraging Unstructured Data with Cloud Dataproc on Google Cloud em Português Brasileiro, designed to build upon the foundational knowledge of Data Engineering on Google Cloud Platform. Delve into video lectures, demonstrations, and hands-on labs to master the creation and management of computing clusters for executing Hadoop, Spark, Pig, and/or Hive jobs on Google Cloud Platform.

Learn to access various cloud storage options, integrate Google's machine learning capabilities into your analysis, and create and manage Dataproc clusters using the web console and CLI. Discover how to utilize clusters for Spark and Pig jobs, create iPython notebooks integrated with BigQuery and storage, and integrate machine learning APIs into data analysis. This course requires a basic understanding of Big Data and Machine Learning on Google Cloud Platform (or equivalent experience) and some knowledge of Python.

  • Master the fundamentals of Cloud Dataproc and its benefits over on-premise Hadoop options
  • Customize and manage Dataproc clusters using the web console and CLI
  • Explore working with various cloud storage options and integrating machine learning APIs into data analysis

Certificate Available ✔

Get Started / More Info
Leveraging Unstructured Data with Cloud Dataproc on Google Cloud em Português Brasileiro
Course Modules

This course is divided into four modules, guiding you through Cloud Dataproc fundamentals, executing Dataproc jobs, using GCP, and analyzing unstructured data with machine learning.

Módulo 1: introdução ao Cloud Dataproc

Module 1: Introdução ao Cloud Dataproc

Explore the foundational concepts of Cloud Dataproc, including defining unstructured data, extracting value from it, and the comparison between Cloud Dataproc and Hadoop options. Learn how to create and customize a Dataproc cluster, and gain hands-on experience through practical labs.

Módulo 2: como executar jobs do Dataproc

Module 2: Como executar jobs do Dataproc

Discover the methods for submitting jobs, the separation of storage and computation, and the importance of networking in data processing. Gain insights into sending Spark jobs and working with structured and semi-structured data, and build proficiency through hands-on labs.

Módulo 3: como usar o GCP

Module 3: Como usar o GCP

Learn to utilize GCP, leverage BigQuery support, and customize clusters. Master the installation of software in a Dataproc cluster and automate cluster tasks using CLI commands, all through interactive labs and demonstrations.

Módulo 4: como analisar dados não estruturados

Module 4: Como analisar dados não estruturados

Delve into the details of machine learning, its application, and natural language processing. Gain practical experience in adding machine learning to your data analysis through comprehensive hands-on labs.

More Data Analysis Courses

Data Science Fundamentals with Python and SQL

IBM

Data Science Fundamentals with Python and SQL. Gain essential skills in Python, SQL, and statistical analysis for a career in data science.

Visualización Avanzada de datos con Python

Coursera Project Network

Learn advanced data visualization techniques in Python using popular libraries like Seaborn, Altair, Bokeh, and Matplotlib. Master the art of choosing the right...

Data Warehousing with Oracle: Design a Database

Coursera Project Network

Data Warehousing with Oracle: Design a Database

Supply Chain Network Optimization Using MILP on RStudio

Coursera Project Network

Learn to optimize supply chain networks using MILP on RStudio in under 2 hours.