Welcome to the world of big data, cloud, and data science technologies! One of the major barriers for beginners or even experienced professionals when learning new technologies is setting up an environment to experiment and practice. With the advent of cloud-based development environments, this has become less of a problem. In this article, we will walk you through setting up a free environment on Gitpod and GitHub, where you can test and learn Kafka, Spark, and Hadoop.
Table of Contents
- Introduction
- Setting Up a Free Gitpod Account
- Setting Up a Free GitHub Account
- Setting Up Your Environment
- Conclusion
Introduction
Before we dive in, let's first get acquainted with the tools we will be using.
- Gitpod: Gitpod is a cloud-based, open-source platform for automated and ready-to-code development environments. It provides a pre-built, fully equipped online coding environment that you can start using instantly.
- GitHub: GitHub is a web-based hosting service for version control and collaboration. It allows you to work on projects from anywhere in the world.
- Kafka: Kafka is a distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
- Spark: Apache Spark is an open-source, distributed computing system used for big data processing and analytics.
- Hadoop: Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation.
Setting Up a Free Gitpod Account
Gitpod is an open-source platform that provides ready-to-code development environments in the cloud. With Gitpod, you can automate your dev environment setup, start projects with a single click, and avoid the "works on my machine" problem once and for all. It integrates smoothly with popular platforms like GitHub, GitLab, and Bitbucket, allowing you to start coding instantly from any git context. Gitpod's environments are ephemeral, meaning they can be created, used, and discarded as needed. It's an excellent tool for both individual developers and teams, facilitating collaboration, and accelerating development workflows.
Gitpod offers a free tier that gives you access to all workspace classes, prebuilds, and IDEs for up to 50 hours per month. Here's how you set it up:
- Visit the Gitpod website at https://www.gitpod.io/
- Click on the "Get Started for Free" button. This will take you to the Gitpod login page.
- Here, you have the option to log in with GitHub, GitLab, or Bitbucket. Since we are also setting up a free GitHub account, click on "Continue with GitHub".
- If you already have a GitHub account, you can log in with your credentials. If not, don't worry. We will guide you on how to set up a free GitHub account in the next section.

Setting Up a Free GitHub Account
GitHub is an essential tool for any developer. Follow these steps to set up your free GitHub account:
- Go to the GitHub homepage at https://github.com/
- Click the "Sign up" button on the top right of the page.
- Fill out the form with your desired username, email address, and password. Remember to verify your account by completing the captcha.
- Once you've completed the form, click the green "Create account" button.
- You'll be asked to set up a personal plan. Choose the free plan, which gives you unlimited public and private repositories.
- Complete the setup by filling out the optional survey, and you're done! You now have a free GitHub account.
Now that you have both Gitpod and GitHub set up, you can log into Gitpod using your GitHub account.

Launching Your Gitpod Development Environment
With Gitpod, you can seamlessly integrate your GitHub repositories and create a ready-to-code development environment. Here's how you can do it:
- Identify your GitHub Repository: First, identify the GitHub repository you want to work with. For this example, we will use the following repository:
https://github.com/fenago/kafka-advanced. Please replace this URL with the URL of your own GitHub repository.

2. Point your Repository to Gitpod: To launch your Gitpod environment, you need to prepend the GitHub repository URL with https://gitpod.io/#. So, for our example repository, the URL becomes: https://gitpod.io/#https://github.com/fenago/kafka-advanced.

3. Launch your Development Environment: Now, all you need to do is to enter this URL in your browser's address bar and hit Enter. Gitpod will start creating your development environment.

As Gitpod creates your environment, it scans your repository to understand its structure and requirements. It then recommends the appropriate plugins and language support you'll need for your project. This smart feature ensures that your development environment is tailored to your project's needs, saving you the hassle of manually configuring these settings.

This seamless integration of Gitpod with GitHub brings everything together. It not only simplifies your setup process but also creates a ready-to-code environment where you can start working on your projects instantly. This is especially useful when dealing with complex technologies like Kafka, Spark, and Hadoop, as it allows you to focus on learning and experimenting, rather than getting bogged down by setup and configuration details.
Setting Up Your Environment
With your accounts in place, it's time to set up your environment to experiment with Kafka, Spark, and Hadoop.
- First, open the Gitpod terminal.
- Navigate to the
~/Downloadsdirectory using the command:mkdir ~/Downloads && cd ~/Downloads - Now we will download the necessary files. Enter the following commands to download Kafka and Spark:
wget https://downloads.apache.org/kafka/3.4.0/kafka_2.13-3.4.0.tgz
wget https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz
4. These commands will download the necessary files for Kafka and Spark into your ~/Downloads directory.
5. Next, we need to install them. Enter the following commands to extract the downloaded files:
tar -xvf kafka_2.13-3.4.0.tgz
tar -xvf spark-3.4.0-bin-hadoop3.tgz
6. This will create directories for Kafka and Spark in your ~/Downloads directory.
7. Now, let's start them up. Navigate to the Kafka directory and start the Zookeeper server:
cd kafka_2.13-3.4.0
bin/zookeeper-server-start.sh config/zookeeper.propertiesMake sure you allow the ports to be opened! You can run:
sudo apt install net-tools
netstat -na | grep 2181OR you can click on PORTS in GitPod and see which ports are opened:

8. Now start the Kafka Server in another terminal

cd ~/Downloads/kafka_2.13-3.4.0
bin/kafka-server-start.sh config/server.properties

8. Similarly, in a new terminal, navigate to the Spark directory and start the Spark server:
cd ~/Downloads/spark-3.4.0-bin-hadoop3
sbin/start-all.sh
Note: Be sure to check the official documentation for Kafka, Spark, and Hadoop for more detailed information on installation and configuration options.
Click on Ports and view some of the UI's that are available from these big data, cloud, and data science frameworks:

Conclusion
And there you have it! You've set up a free Gitpod account, a free GitHub account, and a test environment for Kafka, Spark, and Hadoop. This provides you with a playground to experiment with these technologies and learn more about big data, cloud, and data science. Remember, the best way to learn is by doing, so don't be afraid to get your hands dirty and start exploring these technologies!
We hope this guide has been helpful. If you have any questions, feel free to leave them in the comments section below. Happy coding!
References
Here are some important links to vendor documentation that will help you dive deeper into each of these technologies:
- Gitpod Documentation
- GitHub Documentation
- Apache Kafka Documentation
- Apache Spark Documentation
- Apache Hadoop Documentation
Remember, documentation is an excellent resource when you're trying to understand new technologies, so make sure to make good use of it!
Claps and Follows
If you found this guide helpful, please show your appreciation by giving it some claps! You can clap more than once — in fact, you can clap up to 50 times if you really liked it!
Also, consider following me to stay updated with my latest posts. You can subscribe here.
Thank you for your support, and once again, happy coding!