This is an update for the last 2 months — June and July as I could not find time to write an update last month. I and my son have been pair-programming together on a daily basis — typically 8 pomodoros, if my work schedule allows it. If not, either 2 or 4 pomodoros.
Data Quality checking with Pandas
In June, We started with some utility functions on top of Pandas to speed up some of the upfront data quality (DQ) checking work I do as part of my consulting work. When I connect to tables from ERP systems especially from big vendors, there are a lot of empty columns, columns with just single-values — I wanted to write a script that would take a table name and strip out all these unnecessary columns and give me a frequency table for the rest of the columns. When I go through the frequency table output with the client, we are able to identify many other DQ issues.
Many little moments of joy and satisfaction from TDD
So, I set up pytest and started writing the function using TDD. It has been such a joy to use TDD, especially the small atomic steps that it allows us to take and the ability to see how things unfold, especially when the output is something complex. The ability to see the green in the Terminal as I press the Save key is satisfying — thanks a lot to the creator of pytest and pytest-watch. What tremendous tools. I could feel the freedom to refactor freely since the tests were there to cover for anything I could miss out — this is such a satisfactory feeling.
Jupyter or VSCode?
Since I was new to TDD, I had to figure out a lot of things and make them up as I went along. I did not know whether to use VScode along with pytest as the main place to code or Jupyter. Given the user-friendliness of Jupyter, I started off with Jupyter. But getting TDD to work in Jupyter was a bit cumbersome. But after some rounds, I really missed seeing the red-to-green change as I write each line of code. TDD really connects each line of code I write with the test it is supposed to make it green — I was missing it.
But if go to VSCode, I was missing some of the quick experimentation and the pretty printing that Jupyter provides. Printing the pandas dataframe to the console and trying to make sense of it was painful. So, over time, I have figured out that Jupyter is for experimentation and VSCode is for writing real code.
For example, there were a number of cases I did not clearly know what I wanted. Or I knew what I wanted, but I did not know enough python or pandas to get it done. Hence, I needed to experiment in Jupyter notebook first to figure out what I wanted. Then, I would come back to VSCode and start from scratch and going through the motions of TDD to spin out the code again.
This has led to a surprising behaviour in me — once I write some code, I feel some attachment to it. I don’t want to throw it away or I don’t want to scrap and rewrite it. But somehow with TDD, this attachment seems to be going away slowly. Many a times, when I have working code that I could copy and paste from Jupyter notebook to VSCode, I would just delete the whole thing in Jupyter and start from scratch in VSCode. It is such a nice feeling to see my code evolve using the red-green-refactor process. I am grateful to Harry for his wonderful book TDD with Python that took me by hand into this beautiful world.
As part of this project, I was writing some config files in JSON format. But I hate the need to explicitly put double-quotes around every piece of text in JSON. So, I was quite pleased to see that YAML just allows me to do just that. When I digged into YAML, I realized how simple and beautiful it was. Like markdown, the syntax is unobstrusive — a human could write it and a program could understand. So, I started using YAML in my project also.
Quran SRS Project v2
It has been almost a year since I have touched this personal project. I have been using the app every day and so was my family and a few friends. I have run into a few bugs in the tool and my family also have seen some of these bugs. But I decided not to go on fixing the bugs or introducing new features unless I had tests for all functionality. But this was such an enormous task that I kept pushing it away. But finally around mid-June, on a weekend, we got into it.
When I started reviewing the code, the first feeling I got was that it was very complex. Thank God, I used to write lot of comments and I tried my best to write code in such a way that my future self would not totally lost. But lost I was :) — there were 2 complex pieces — Django itself and the algorithm that I had created for spaced repetition. My original thought was that I could just add tests and then start fixing bugs and adding features. But after reviewing the code, I felt that it would be better to start from scratch and rebuild. Especially since many of the pieces are quite new to me — Django, TDD, pytest, etc.
I went back to Harry’s book and started going through his example page-by-page and trying to apply the ideas to my project. He used Selenium — I decided to try Cypress since I did not really like Selenium and I loved Cypress when I looked at it in a YouTube demo. So, I spent some time learning Selenium and writing some simple functional tests.
Then, we spinned the Django project, set up pytest and started writing unit tests and getting them to pass. Writing this way helps me to understand how Django works behind the scenes as I could see the impact of each line of code or each setting that I create or modify. Trying to think of the best way to write unit tests also forces me to understand the concepts deeper level.
For example, yesterday, I read in Harry’s book that we should always redirect after a POST. I wondered why — so, I went and read up about it and I was very impressed with the explanation. Then, I came back to see what redirect in Django was doing. I wanted to know whether it is the server doing the redirect or it is the browser making a second request and I got the answer. That helped to really see that there are 2 distinct steps in that redirect steps and I need to write distinct unit tests to check for both of them.
describe-style tests in Python
As I was writing more tests in python, I was longing for the beautiful describe-it syntax in JS that I was using in Cypress. So, after a google search and a few minutes of exploration, I landed onto pytest-describe. Though it is not exactly like the JS flavour, it is very cool as it allows me to write test case name which read like a typical English sentence.
Just see this screenshot of the test that I wrote for the home page and test output from pytest. I could read the output like this — “home page template is available”
So, in the last 2 months, these are golden finds and I am enjoying them and I am grateful for them.
- pytest watch
- pytest describe
- Test Runner/Explorer in VSCode
- Django native debugging in VSCode
- Django pytest extensions
I have not yet come to the station to enjoy Django fully as a developer yet as it still seems quite complex and unwieldy. But as a user, I have been benefiting from the stability of the framework and how fast I could get it up and running, especially given that I was newbie. So, I am very grateful to Django and my Django teacher Kevin for introducing me to it.