Month 28 — The teacher appears when the student is ready — My Linux teachers arrived

Siraj Samsudeen
7 min readMar 4, 2022

I finished my first pass of the Linux Command Line Book in January with the focus on understanding how to work with multiple versions of node on command line.

Normally, my approach to read technical books is to read them cover-to-cover trying to understand every single page and to extract useful nuggets from them. When I read each idea or functionality provided by a tool, I do think about how I can use it in real-life? But if I encounter a big list of commands, my approach would be to create flashcards and memorize them. Not this time. So, my reading was quick. I created very few flashcards.

Teacher 1 — Messing with Missing Data

Then in one of my data analytics projects, I was reviewing data refresh step done in Pandas when I realized that something strange was going on. I was seeing certain trends in the data that were very strange — for example, the price of Tomato always peaked on the 12th of every month.

then after a bit of analysis, I realized that some data might be missing and this viz in Tableau made it clear that there is something wrong with the data. Pay attention to the dates 10 to 12 — they are missing in 2021, but available in 2020 for all the months, even though I was not supposed to have data for all the months in 2020.

The challenge with this project was that the data sets were huge — the data sets made tools like Tableau, Tableau Prep and Excel hang for a while. There were a number of steps in the data preparation process — so, I needed a way to quickly understand where the issue pops up first. But doing it by opening the data in one of the regular Tools would take a lot of waiting time — All I needed was to check whether the date was there for a certain date in each intermediate file.

Linux tools came to the rescue — Since I learnt grep, I started with grep to quickly get a yes or no to check whether a CSV file that came from the source DB had the data for a certain date. Yes, it was there. Then, I quickly went to the last CSV file produced by Pandas and checked using whether the data for that date was there — alas, it was not there. The process was not one-step like I am describing, and it took a bit of time for me recall the different options of grep and try them out, etc. — but it was way faster than opening the file in Excel or Tableau or even Pandas.

Once I identified that there is some issue in the data preparation step, the next step was to see whether I could do the processing done in Pandas outside to check the results. Again Linux commands came to my rescue — here is what I ran:

Step 1 — Combine all the raw data files quickly
Step 2 — Count the number of records for each date in the file

Then I took this output file to Tableau, and ran the same visualization and voila — I got all the dates in 2021 — no missing data for dates 10–12. It was an incredible feeling to able to zip through this process so fast. Each of the above step took about 30 sec. And It probably took me a few hours to do this entire thing whereas without them, even opening one of the data files would have taken minutes.

Teacher 2 — Delete all old python versions and sync it up with the Python version in my Cloud Server account

The first situation was easy on me — at least in hindsight. It was just a few hours at best. But the this situation was very hard on me as I stepped into territories which I did not fully comprehend and even messed up some of the system files and had to recover . It took me around 2 weeks to get through the entire task — but as I look back now, this was much a better teacher.

The requirement was simple — my Quran SRS project was hosted on pythonanywhere and was running Python 3.7.5. In my local machine, I had many different versions of Python 2.7, 3.6, 3.8, 3.9 and even 3.10. I wanted to ensure that my dev environment matches my production environment — simple requirement, but it was very hard.

the other requirement I had was to manage my packages in python like the way it is done in Node using package.json. As we do npm install , the package.json is updated automatically and all I needed to do when I had to move to production was just to use the updated package.json file.

In the python world, things are a bit more complicated — I had to do pip freeze > requirements.txt then I had to use this file in the production environment. The problem with hand-creating this file was that we often forgot it — being newbie in the python world, I would miss doing this manual step and when I move something to production and run the tests, it would fail. It was just annoying to go back, and add the package manually. I wanted the same npm-like experience. So, I asked my Django teacher and decided to try Python Poetry.

The first Anaconda I faced was the Anaconda package — this monster was supposed to make life easy by bundling all the packages needed for data science projects into one installer. I really liked the tool. What I did not like was that this monster was a black box. It set up python using SYSTEM identifier and would not easily budge. It always interfered when I tried to use Poetry or any other virtualenv tool. I installed pyenv to manage different versions of python and in the last step, the virtualenvwrapper will ignore the python version I had and take the one given by conda. So, I needed to get rid of conda.

I wanted to delete all the different versions of pythons on my system and start from scratch. I have tried this a year before and failed miserably — so, I was apprehensive. Anyway, I need to solve the problem.

I started by just resetting my Terminal itself — I backed up the config files and just erased them. The fun started from there — I needed to properly setup my Terminal. This was relatively easy.

Then came brew which always talked in colorful pictures and funny language that I did not understand — I spent sometime trying to understand how brew works and how installing python through brew and pyenv is different. This is where I messed up my brew installation itself — I was following instructions from some article and overwrote the environment variables that are used by brew and brew poured out in anger :). My heart was beating up fast as I tried to get brew to cool down and continue its work. Finally, I tamed the monster and finally I understand all the colorful vocabulary used by brew.

I had to learn how to use SSH to connect remotely, how to use install tools to compile the right version of python as I faced some C-compilation error due to outdated XCode Command Line Tools, and a myriad of other problems I faced along the way. But finally, I synched the dev and production environments and they are both running on python 3.9.

More situations are popping up

I faced even more issues where my little knowledge of Linux came to my rescue and some of these situations made me transition into my next learning topic — Regex.

I use the Obsidian for my tech notes and Anki for SRS — the plugin I use suddenly created a lot of duplicate cards. So another mess-up to solve but it was difficult to solve it without knowing Regex as I had to do some advanced Regex search and replace to locate the files where this problem is created (I have done this) and to fix them (I don’t know how to do this yet).

So, which learning approach is better?

Now coming back to my learning approach which I adopted — I did not even pay attention to a number of tools I used in the above 2 situations when I went through the Linux book. I just skimmed through and tried to have an idea that such tools existed. It is only when I needed to solve this problem, I went back to learning them. I did not go back to the book anyway — this time, I wanted to use the cryptic man pages to figure things out myself. And boy, I did. After spending a lot of time plowing through the man pages to understand how sort works and solving the missing data problem, I reviewed the sort pages in the book when I was relaxing — then, I understood what a difference a good writer makes — had I read the book fully, I would have understood sort much better and would NOT have to toil with the cryptic man pages.

So, I am wondering what is the right way to learn a big topic like Linux — Should I have covered and understood the entire Linux book so that I could have avoided the read and research phase needed to solve this problem? Or is this project-based learning approach better — I am not sure. But I am quite surprised that so many situations just seem to open up just to help me apply what I learnt.

And as they say, the good things that lead us to growth always come in disguise — clothed as unexpected problems.

--

--

Siraj Samsudeen

An entrepreneur who is coming back to coding after a gap of 16 years due to love of coding.