Part of the How I’m Learning Deep Learning Series:
Part I: A new beginning.
Part II: Learning Python on the fly.
Part III: Too much breadth, not enough depth. (You’re currently reading this)
Part IV: AI(ntuition) versus AI(ntelligence). (You’re currently reading this)
Extra: My Self-Created AI Master’s Degree
Before we get into specifics, here’s a 10 second summary of this post:
10 Second Summary
- 📈 Working through the Udacity Deep Learning Foundations Nanodegree
- 🤓 Using a Trello board to track my learnings
- 📚 Learning Python on the fly at Treehouse and Learning Python the Hard Way Text Book
- ➗Learning Math on the fly with Khan Academy
- 📝 Writing daily about my learnings in this Medium Series
- ⌚ ️Studying 3–4 hours daily using the Pomodoro Technique and tracking time with Toggl
- 📽 Releasing 1 VLOG per week summarising my daily learnings
- 👀 Watching all of Siraj’s videos on YouTube
Now, let’s get deep.
In February this year, I hadn’t ever written a single line of Python code. I had been around tech all my life but never on the creation side of things, only using it for consumption. I wanted to change that. So I decided to start learning to program and my brief research in the field led me to find Udacity, from here I watched countless trailers of their various Nanodegree programs but the one that stood out to me the most was the Deep Learning Foundations Nanodegree (DLFND).
I’m now up to the final module of this program, after sending an email to Udacity support a few days out before starting asking what the refund policy was (see Part 1). I’m glad I didn’t follow through with that email.
It’s only been two months since my last update and I’ve learned an incredible amount. I’m writing these updates mainly as a method of my own reflection. In the past, reflecting on what I’ve learned has been a weakness of mine. Learning something completely new is difficult and I find going back and reviewing what you’ve done is a great way to remind yourself how far you’ve actually come.
As with the other posts, this won’t be entirely about Deep Learning but the DLFND has been the foundation of my studies the past couple of months and from this, I’ve started to up-skill myself in other areas (Python, statistics, algebra).
In this post, I’ll go over a few things including my current study routine and a brief overview of what I’ve learned since Part 2.
What is my current study routine?
As mentioned in the previous posts, I’m using a Trello board as my main planning tool. I put everything required for the DLFND on the board as well as any supplementary study I’m doing outside of the Nanodegree.
You can check out the board if you like, I’ve made it publicly available.
I’ll be keeping the Trello board up to date even once I’ve finished the DLFND with the next courses I’m planning on taking (more on this in a future post).
Every morning, I get up and write a list of goals on my whiteboard.
You can think of this board as organised chaos. It’s littered with random motivational quotes, my “yes and no list”, daily get-to’s (left hand side in black), someday get-to’s (blue checkboxes) and of course my current happiness equation is across the top.
The “yes an no” list in the bottom right is a constant reminder for me of the things I want to say yes to and things I want to say no to. DL is Deep Learning, ML is Machine Learning, AnyGym is a project I’m working on with some friends (more on this later) and the rest should be self-explanatory.
After several years of study, I’ve found the Pomodoro technique works best for me. Every day I aim to complete at least 6 Pomodoro’s (25-minute blocks) of one given topic. For example, if it was a Python themed day, I’d aim to complete 6 Pomodoro’s (150-minutes) of distraction free Python study. Most of this block of time will be completed before lunch (I learn best in the mornings).
I usually do three 25-minute blocks, followed by a 30–45 minute break where I will either eat breakfast or go for a short walk. After the break, I’ll continue with the next three blocks before I finish studying for the day.
This type of routine came about after reading Deep Work by Cal Newport. I’ve found I can get far more done in 3–4 hours of concentrated distraction-free work than in 6–8 hours of continual interruptions.
You could say I’m following a Darwinian type routine.
After his morning walk and breakfast, Darwin was in his study by 8 and worked a steady hour and a half. At 9:30 he would read the morning mail and write letters. At 10:30, Darwin returned to more serious work, sometimes moving to his aviary, greenhouse, or one of several other buildings where he conducted his experiments. By noon, he would declare, “I’ve done a good day’s work,” and set out on a long walk on the Sandwalk, a path he had laid out not long after buying Down House. (Part of the Sandwalk ran through land leased to Darwin by the Lubbock family.)
When he returned after an hour or more, Darwin had lunch and answered more letters. At 3 he would retire for a nap; an hour later he would arise, take another walk around the Sandwalk, then return to his study until 5:30, when he would join his wife, Emma, and their family for dinner. On this schedule he wrote 19 books, including technical volumes on climbing plants, barnacles, and other subjects; the controversial Descent of Man; and The Origin of Species, probably the single most famous book in the history of science, and a book that still affects the way we think about nature and ourselves.
What have I been learning?
The last part of this series finished at Week 6 of the DLNFD, I’m currently up to Week 16. What follows is a brief summary of what I’ve been learning.
It took me a while but I finally clued onto the fact that Deep Learning requires a large amount of computing power. I also learned that GPU’s (Graphics Processing Units) are particularly good at doing the types of calculations that go into Deep Learning (large scale matrix multiplications and such). I’m using a 2016 13-inch MacBook Pro with Touch Bar and it doesn’t have a dedicated GPU. Because of this, training Deep Learning models on my local machine takes an incredibly long time, not ideal.
It was about this time I discovered the power of cloud computing. I had heard about it in the past but had never fully experienced it. If you had asked me what AWS (Amazon Web Services) was a couple of months ago, I wouldn’t have had a clue.
I still don’t fully understand cloud computing but that’s okay, you don’t need to fully understand it to take advantage of it. In a nutshell, AWS is a giant computer you can access via the internet (at least, that’s what I tell myself).
Navigating the AWS console for the first time was daunting but after a few failed attempts I managed to launch my first instance. An instance is essentially activating a small amount of computing power on AWS that you can access. I was amazed. I now had the full power of a GPU accessible through the internet where I could train my Deep Learning models.
Experiencing this for the first time was like driving a car way faster than yours. Suddenly I was blazing through training epochs rather than waiting for the equivalent of five microwave minutes (a long time) on my local machine.
I also discovered FloydHub. To me, FloydHub is a more beautifully designed version of AWS (not to mention cheaper). I preferred using Floydhub because how easy it was to setup and run. With a few lines of code in the command line, you can begin training your Deep Learning models.
Another reason I love FloydHub is their website. It’s so beautifully laid out. I immediately knew what it could and how to use their service within seconds of being on the page. Whereas AWS was a much steeper learning curve.
Does anyone else geek out over web design or is it just me?
With this new found knowledge of cloud computing, I was able to work on my second project for the DLFND. My task was to classify images from the CIFAR-10 dataset using a Convolutional Neural Network.
With some excellent help from the forums and the dedicated Slack channel (thank you everyone!), I managed to submit a working CNN. When you submit a project on Udacity, a message appears saying it will be reviewed with 24 hours. I thought (and still think) to have a full review on a project within 24 hours is amazing. In my five years at University, I never got feedback that fast.
To my surprise, the 24 hours was, in fact, an overstatement. I had a full project review within two hours of submission. I didn’t look at it though, I needed a break from my computer for the rest of the day.
My first submission was far from perfect and the project required improvements before I could receive a passing grade. I spent about 8 hours (and 3 hours on live chat with a Udacity support member) tweaking the hyperparameters, training the model and improving various functions before resubmitting. I passed on the second submission.
Could my model have been improved further? Of course. But I wasn’t going for perfection. I could’ve spent another week striving to make the model better but my goal is to learn the first principles of Deep Learning rather than perfect my project submissions, this can come later.
My full submission is available on my GitHub but forgive me if it’s not uploaded correctly, I’m trying to learn how to use Git and GitHub (more on this later).
About 9 weeks into the DLFND, I saw a new class of Andrew Ng’s Machine Learning Course was starting on Coursera. I figured it would be great to sign up to this to gain a deeper understanding of Machine Learning, alongside the speciality of Deep Learning. Once again, I started the course without reading the prerequisites. Will I ever learn?
Oh yeah, and in the spirit of signing up to things, I decided I’d challenge myself to commit to 100 Days of Code. I started a daily Medium Series and a weekly VLOG documenting my learning. I’ll update those more regularly and do one of these longer form posts every 4–6 weeks or so.
Part 3 of the DLFND course was on Recurrent Neural Networks (RNN’s). I don’t fully understand how RNN’s work yet but I’m slowly grasping the concept. Andrej Karpathy a very in-depth post on the effectiveness of RNN’s, I’m reading it myself and I’d highly recommend it.
The way I think of RNN’s is that they take a sequence of inputs and are able to produce a single output or sequence of outputs.
Where might a single output from a sequence of inputs come in handy?
Say for example, you had a bunch of movie reviews (sequence of words as input) and you wanted to tell which ones were good and bad reviews (single output). An RNN could be used to perform sentiment analysis on the reviews and output whether or not the review is good or bad.
Where might a sequence of outputs from a sequence of inputs bet used?
In the case of translation. If you had a sentence in English (sequence of words as input) and wanted to translate it to French (sequence of words as output) an RNN may be used to perform the translation.
Now, of course, there are other outputs RNN’s can produce (single input to multiple outputs) but I’ll let the experts handle the explanations of those for now. My definitions are basic but that’s how I learn best. I start with an overall concept of how things work and then slowly build upon it.
A cool example of how an RNN can be used to generate music is demonstrated in this video from Siraj.
Using long short-term memory (LSTM) networks, Siraj was about to take a sequence of Musical Instrument Digital Interface (MIDI) and train an RNN to generate completely new sounds.
What is LSTM?
I imagine LSTM’s as being a sequence of valves. If you imagine an entire RNN to be a plumbing system, where the water is the information that flows through the network. LSTM’s decide how much water should flow through the network. Combining a number of these will help to fine tune the outputs.
Again, this is how I think of them and there’s much more going on under the hood. I’d recommend this post by Shi Yan for a deeper understanding.
What is MIDI?
MIDI is the equivalent of the alphabet for musical devices. Just like in a machine translation model where a sequence of words is the input (e.g., English) and the output (e.g., French), a sequence of MIDI inputs (e.g., old piano songs) can be used to generate a sequence of outputs (e.g., new piano songs).
Using the knowledge we had learned in the previous set of classes, the next project involved creating our own RNN with the goal of generating a TV Script.
The input to the network would be a dataset containing the scripts of 27 seasons of The Simpsons, specifically scenes at Moe’s Tavern. Using this sequence of inputs (strings of text), an RNN would be used to produce a completely new scene (sequence of outputs).
I won’t dive fully into the details of the project (I’ll upload my code to GitHub when I get the chance) but after prepping the data and a few hours of building and training the network, below are some of the outputs I got.
moe_szyslak: minimum wage and tips.(meaningfully) of course there are, but uh, two are.
homer_simpson: you know what’s the stirring?
moe_szyslak: well, why all i have if this was or the ladies way he guy is dead.
ned_flanders: hey. sorry.
homer_simpson:(big smile, pal) is now a homer, i’m doin’ a pig than i(up with a pretty warmly can one of guys in this nervous, then what so lonely. now, consider put a terrible.
moe_szyslak:(laughs) he’d be? keep me won! a(very homer)
homer_simpson:(to book) procedure.
homer_simpson:(laughs) now if you want on the game, and i was my new life like”.
homer_simpson: let’s just had that what would be marge.
teenage_bart:(talk-sings, moe) the part is so good, where’s that really picture you.
grampa_simpson: listen, homer, it’s the time i remember, i’ve been using it?
moe_szyslak: drinking will help us plan.
homer_simpson: this valentine’s crap has to be a bar.(gets off) new_health_inspector: bar an idiot.
homer_simpson:(to self) sorry, i need i’m behind your foot.
moe_szyslak: but i suppose i got a two hundred and people all can use the kids.
homer_simpson: to be the best thing?
barney_gumble: ’cause only one i thought you said.
carl_carlson:(to self) someone’s makes a little one, can i have a free? take this!(homer’s sound)
homer_simpson: the one, but i did not going to find it out.
moe_szyslak:(sings) i just wanna tell my life till i’m on their go!
moe_szyslak: the guys are make around in the gentleman’s of woman.
lenny_leonard: oh, you don’t let me do being here? no, moe.
barney_gumble: you know, it’s you, moe. the drinks are on you.
seymour_skinner:(sighs) isn’t it eyes no more.
homer_simpson:(chuckles) all right.
I found it incredible that these scripts were entirely generated by the network. They’re also more than likely completely unique, no one would have ever created a scene like this before.
The next topic in the DLFND was Transfer Learning. I think of Transfer Learning as taking the knowledge you have in one domain and applying it to another domain without explicitly altering it that much.
So in my case, a real life example would be me applying the knowledge I have of working out to studying. Over the past seven years, I’ve found the best way for me to workout is with a goal and a set period of time to achieve that goal. If I take that knowledge and apply it to studying, I consider it Transfer Learning.
In the case of Machine Learning/Deep Learning, you could take a model that has been trained on one dataset and apply it to another similar dataset without having to completely retrain the model. Being able to do this, saves an incredible amount of time.
An example of Transfer Learning could be used is to train a robot in a virtual simulation and then use what it has learned in the virtual world and apply it real world scenarios. Similar to what OpenAI has done with their block stacking robots.
Robots that Learn - by Open AI
If you would like to learn more about Transfer Learning, I highly suggest checking out this blog post from Sebastian Ruder.
The fourth project in the DLFND involved using a neural network from one language to another.
This project would utilise what we had learned about RNN’s to build a network that would be able to translate a small string of English words into French.
If you had asked me a few months ago to even consider how this was done, I wouldn’t have been able to tell you. Now, I still can’t entirely explain the process but I have a fair idea of how apps such as Google Translate do the majority of their translations.
One trend I’ve found with the classes and projects is I take about 50% (sometimes more) more time to complete them than what is stated as the expected working time. For example, this project had a working time of 2 hours on in original description, however, I took just over 6 hours to fully complete it, more if you include model training times. I track all of my online studies through Toggl, this helps me see where my time is being allocated and helps me to adjust/plan my study schedule.
It took me three submissions to earn a passing grade for this project, mostly because if I’m honest, my first submission was a bit rushed. The model worked but it could have been better.
The feedback from Udacity reviewers is always swift and full of insights and further learning opportunities.
By the end, my model still wasn’t perfect but it did an okay job at translating a small sentence of English words to French. I could’ve spent more time on the project to make it better but after 6+ hours of trying to improve the accuracy by minor percentage points, I figured it was best to move on and keep learning.
One important detail I picked up was how varied hyperparameters can be. I was looking on the Udacity forums and found some students using vastly different parameters and still getting great results. My intuition tells me that hyperparameters are just the final step. If your model isn’t built correctly in the first place, no amount of hyperparameter adjusting will help it improve (or only very minimally).
I had officially created a model that knew more French than I did.
Week 15 — Now (14 July 2017)
It’s hard to believe I made it to Week 15 without properly using GitHub. When I started the course, I used git clone to bring down all of the Udacity Deep Learning files and have used the same files ever since.
This caused an issue with Project 4. I ended up completing an older version of the project, only realising this when I was 80% of the way through. It was because of this, I decided it was time to start using GitHub properly. I had been ignoring it through the majority of the course because I simply didn’t know enough about it and my way of working was going fine, until the Project 4 incident.
If it could be better, it’s as good as broken. — Greg Plitt
This quote was completely relevant to me. My workflow could be better. It was broken.
This led me to start learning about Git and GitHub on Treehouse. Immediately, I started to become more confident with using it. I haven’t quite figured it out yet but it’s now on my list. And I promise by the time I write the next post, you’ll be able to see all of my files on GitHub.
Part 4 of the DLFND is on Generative Adversarial Networks (GANs). This is where I’m up to now.
I say that I’m excited about every new section of the course but this one has already blown me away.
The concept of GANs was thought of in 2014 by Ian Goodfellow after a conversation with friends at a bar (what an incredible founding story).
I think of GANs as being two networks competing against each other to produce a better output.
There’s one network called the generator (G) which takes in a random sample of noise. The goal of the generator is to produce new samples from the noise that are of the same probability distribution as the real sample inputs that go into another network, the discriminator (D). The role of D is to decipher which of the inputs is real.
As G gets better at producing fake samples, D becomes better at detecting fake samples. The two networks play off against each other and become better at their specific roles over time.
If you’re looking for a more in-depth description, I’d suggest checking out this article by Arthur Juliani. Juliani uses Spongebob Squarepants as an analogy to describe GANs.
Due to GANs still being a relatively new breakthrough in Deep Learning, most of the use cases for them probably haven’t been invented. Some things GANs are currently used for include generating images of faces, converting sketches into full-scale pictures (edges2cats) and transforming a horse into a zebra.
Video: CycleGAN turning a horse into a Zebra using styling transfer at 60 fps.
The upcoming final project for the DLFND involves building a GAN to generate faces. I’m kind of scared and excited.
For now, I’m still getting my head around how exactly they work. I’ve got a couple of weeks left of the course before the final due date (Aug 3, 2017).
When I signed up to the DLFND I had barely any experience in Python or Machine Learning and hadn’t touched calculus since high school.
So I’ve been using a number of other resources to help me learn the skills required for Deep Learning.
For Python, I’ve been using a combination of Treehouse and the Learn Python the Hard Way textbook. As of writing, I’m nearly finished both of these, I’ve been practically learning the required Python skills as I go along.
To get some foundation knowledge in Machine Learning, I’ve been using the Machine Learning course by Andrew Ng on Coursera. It’s been one of the best courses I’ve ever taken. I completed the course last week (without an official certificate).
I still haven’t mastered using Git or GitHub but Treehouse was a great way to quickly up-skill myself on these two tools. I promise, my GitHub will be somewhat presentable within the next few weeks.
I’ve also been using Anki every other morning to help cement my knowledge of Python syntax.
I don’t know if this fits in with this post but I figured I might as well put it here because it’s where I’m devoting a portion of my time every week.
At the end of May, I was finally up to date with both the DLFND and Machine Learning course from Coursera I had started a few weeks prior. This gave me some time to start working on some of a side project I’d been planning for a while.
Using what I’d learned about AWS and cloud computing. I built a LEMP stack server on the free AWS micro-tier to host a website I’d been planning. It took me a fair bit of Googling around to find some solid guides but I managed to get it up and running with a WordPress front end. After a couple of days, I had a fully functional website up and running, only paying for the domain name.
My background is in fitness and nutrition, I plan on combining the skills I’m learning through these various courses with what I’ve learning in the past to bring value to the world.
My team and I have a goal to help the world move more. So we built a platform to connect fitness facilities and users around the world. Right now, it’s functional in our home city, Brisbane but we plan on expanding sometime in the future. It may never take off and I’m aware of that but that’s not the point. The journey is everything and I’d rather try than to have not tried at all.
You can check out our progress at useanygym.com.
I don’t yet know how any of the stuff I’ve been learning can be tied to this project as of yet but I’m sure I’ll figure something out. Plus, it’s always fun working on your own projects.
I still wouldn’t be able to code a whole Deep Learning Model if you asked me to but I’m slowly understanding the fundamental concepts as well as getting better at building networks from scratch.
I’ve still got a few weeks left of the Deep Learning Nanodegree so that’s my priority learning for now.
I’m going to keep learning Python using various online resources and practice by building some projects in the future (I’ll do my best to write about these).
The Trello board will be updated every couple of days with what I’m learning so be sure to check that out if you’re interested.
I’ll also be documenting the rest of my 100 Days of Code in my Medium Series as well as making a weekly VLOG about what I’ve been up to.
I’m excited for what’s next on my learning adventures. You can expect another post with more in-depth next steps after I’ve completed the DLFND.