Tips and tricks from my personal experience—followed by a list of study materials.
Hi—you must be interested in passing the Google Cloud Professional Data Engineer Exam. Google recommends that you have 3+ years of experience before attempting the exam. However, I think that if you have some experience with other cloud providers, databases and SQL, you can still do it, namely because GCP is much more intuitive than its competitors (in my humble opinion).
Unlike other certifications, there isn’t a regimented coursebook or training manual. That is because Google expects you to be a practitioner and know most things from experience. But, realistically speaking, it’s very hard to gain exposure to all of the products and services. So I decided to write this article and hand over some of my learnings that helped me pass the exam. If there is anything you think I missed or got wrong, please leave a comment and I will try and fix it.
I took the exam in December 2020 and passed it on my first try after spending about 30 hours on prep time.
Here's what awaits you:
- 2 hours
- 50 questions
- 4 answers per question
- The single correct answer, with the exception of about 5–6 questions that require two answers
- You can flag questions for review later
- Able to go back to any question at any given time
- Valid for 2 years
- When you click “ Finish”, you will get an immediate result: either a pass or a fail. There is no score or explanation.
Two of the answers you can discard right away
For example, look at the question below. You have two answers that specify BigQuery and two for Cloud Storage. And half of those mention Dataflow and the other half, Dataproc.
In questions like this it should be easy to discard the less viable option (Cloud Storage, because the requirement asks for SQL queries), and then focus on choosing between Dataflow or Dataproc.
Even if you don’t know the exact answer, it is possible to bring your chances to 50/50.
Read the questions VERY carefully.
Once you remove the improbable answers, you are often left with two options that seem equally plausible. As demonstrated in the example above, you need to pick between Dataflow or Dataproc. If you have paid close attention, you would have noticed the correct answer is Dataproc because they mention custom Spark jobs.
The correct answer often hinges on a single word or phrase. So read VERY carefully.
Google products over open source
You probably already know this, but correct answers in this exam are almost always the ones that imply deeper integration with GCP. Look at the question below. It is asking you to choose between Pub/Sub and Kafka. It should come as no surprise that the correct option is the former.
Practice questions are the key to passing
In hindsight, the most efficient method (for me) was to go over example questions and then double down on the incorrect answers. If you are not a complete GCP beginner, this will save you lots of time and help you fish out the areas that you need to improve on.
There are a few paid courses out there, and you can find the list at the bottom. However, I found none of them worth the money or time. They felt too basic and targeted at beginners, people who have next to no experience with cloud platforms, databases or ML models. I had to resort to listening at 2x speed or just skimming through the transcripts. Not to mention that some of them were created with the old (pre-April 2019) exam in mind.
I hold a similar sentiment towards Qwiklabs (you would get to experience those if you enrol for any of the Coursera courses, or you can even sign up for them directly). If you are new to GCP and cloud environments in general they can be a great stepping stone. But in my case, I don’t think they taught me anything that was useful for the exam. Most of the labs felt like a fancy copy-paste exercise. And in production projects, we almost never use the Cloud Console, but Terraform and Cloud Deployment Manager instead (no worries, those are not covered in the exam).
Preparing for the Google Cloud Professional Data Engineer Exam
In my opinion, this is the only course you should take. It is lead by the guy who actually makes the GCP exams. The course will not teach you what the answers are, but it will give you a more practical idea of what the questions will be like. There are a fair amount of example questions, followed by detailed explanations (something that was quite hard to come by).
I enrolled in the 7-day trial and finished the course in 2–3 afternoons, and then cancelled my subscription.
The Data Dossier
The above is a great cheat-sheet, available for free. I came across it when I did the Linux Academy course (see ‘Paid Resources’ below). Highly recommended.
Data Engineering on GCP Cheatsheet
Another cheat-sheet available on Github. A bit outdated, but still quite useful.
Some conference talks on YouTube
The above is a link to a curated YouTube playlist (not by Google). It has 9 videos from the Cloud Next 19 conference. It adds up to 6.5 hours of content of varying complexity. They are by no means targeted at exam takers, but I found them quite useful and informative.
Other Free Resources:
- A video with a few example questions and correct answers. No explanation though. From the same guys as the 310 questions (see ‘Paid Resources’)
- ML and Unstructured Data Video [5 min]
- Official exam questions
- More exam questions. Another resource I found out about after I took my exam. What I hear from colleagues is that the “correct” answer is never the true correct one, and you need to read the comments to find out which one it is.
- How I passed the Google Professional Data Engineer Exam in 2020
- How I Passed the Google Cloud Professional Data Engineer Certification Exam
- Google Cloud Professional Data Engineer Certification — 2020 Mini-Guide
- Google Cloud — Data Engineer Exam Study Guide
Other Paid Resources:
- A set of 310 example questions and answers for 16 USD. They do provide a free sample of 10 with no need to register to get them either. In hindsight, if I had known about it, I would have forked the 16 dollars, as it seems like a good deal.
They also have a course for the same price. I haven’t tried it, but based on what I see on their youtube channel it’s just a bunch of powerpoints read by a speech generator.
- Linux Academy course: Google Cloud Certified Professional Data Engineer (LA). A bit too basic for my taste, but it had a pretty decent collection of questions. Unfortunately, there is no way to correctly link that course. You will need to register and then search for it by name.
- Pluralsight course — Have not tried it myself. All I know is they have a 10-day free trial. Let me know in the comments if you have taken it, and how good it was.
- Official Google Cloud Certified Professional Data Engineer Study Guide: A book that I was not aware of until I started writing this article. If anyone has bought it, please let me know if it's worth the $36.