Moving from MATLAB to Python as an oceanographer.
I only started programming during my MSc. It’s actually one of my biggest regrets, that I didn’t start learning at least the basic concepts of programming, data manipulation and modelling a bit earlier. This is mostly because you just cannot do the sort of science I want to do without being at least a relatively competent programmer. The data I work with is huge. It physically doesn’t fit in excel. There are some pieces of software for manipulating satellite data, and creating simple ocean models of various types, but to advance the use of these techniques, learning some sort of code language is unavoidable. Learning to code also makes your methods repeatable. This is particularly useful, allowing for calculations to be repeated over similar data sets from different regions or to repeatedly make and alter figures, even with smaller amounts of data.
Benefits and necessity aside though, learning to code has not been easy for me. For students, I understand why they may avoid learning to code. Like maths, there seems to be something about programming that just turns some people away. Programming is another marmite (FYI, I definitely am not a maths person, though I do love marmite). People who love maths and coding – this post is probably not for you, but if you’re a newish oceanography student/programmer working with Earth system data, hopefully some of my experiences might be useful to you. Learning to code is a significant outlay of time, and an integral part of maximising the efficiency of many scientists work flows. So, whether you are a student, or an established scientist, deciding to learn a language, which one to start with, or whether to learn a new one, is not a decision that can be made lightly.
A large number of oceanographers, mostly those who have come through specific degrees in oceanography, use MATLAB to help them work with data. MATLAB was my first experience with programming. Many don’t consider MATLAB to be a programming language as such, and, contrary to many other languages, it is commonly used with a development environment or Graphical User Interface (GUI). This provides a soft landing for someone new to programming, allowing you to click on things, physically see some of the data, manually manipulate plots etc, and crucially, to write and run scripts through a single interface. MATLAB was a great tool for me during my masters research, and I learnt many more uses for it over the course of my PhD. However, there are a number of limitations to using MATLAB that started becoming apparent to me during this time.
The first thing I realised, was how dependent I was on the user interface. This happened while attending a course on linux systems and Python programming early in my PhD. The course required using a command line interface. For anyone not familiar with computers beyond GUIs, this basically means typing instructions to the computer, to navigate through the file system and execute any programmes you want to use. I was stuck at the first hurdle here. Although you can navigate through file systems in MATLAB in largely the same way as you do through a command line, I never had, I’d always clicked and navigated around my computers structure the same way I would before my programming days. I was also completely baffled by the concept of writing a script in a text editor and then calling it through the command line. Again, you can run MATLAB like this, but I had always done it through the GUI. By this point in the course I was so confused, that I totally failed to engage with the lectures and practicals on Python programming.
I eventually unlearned my fear of the command line (working with Unix/Linux systems definitely helped there), but I still stayed with MATLAB, because who wants to give up precious research time to learn a new programming language when there’s so much PhD to do? Towards the end of my PhD though, another problem became apparent. MATLAB is not open source, i.e. you have to pay for it. This made teaching with it at Universities that couldn’t afford a licence, or recommending it to students I was trying to help, difficult or even impossible. It also meant that I couldn’t use it on high performance computing facilities which also didn’t have licences. Python on the other hand, is open source and free to use. So, I made the pretty tough decision, 6 months before the end of my PhD, to learn Python, to complete the last part of my data processing. Still, I continued to use MATLAB, because a lot of my code for figures etc was already written, until the end of my PhD.
A fresh start at my new job at PML, combined with the fact that some project work had already begun, allowed me to completely transition to Python as my main language. I thought this would slow me down quite badly in my first few months, not being as fluent with Python as I was with MATLAB. I remembered the slow pace of learning to do even basic things in MATLAB. However this has not been the case, I’m really pleased to have picked up Python so quickly, despite not considering myself a natural programmer. It could be that I’ve also had to contend with learning some Fortran in this time…which does make Python seem much easier by comparison!
So I would say, if you are worried about the time it would take to transition to a new language, it might not be as big a hurdle as you think. Choosing timing carefully obviously helps – a new position is a good opportunity for this. But if you can run and ideally work with some scripts in another language before you totally transition, I think it softens the blow.
There are, of course, other options to consider beyond MATLAB and Python. Many of my colleagues use R, which has the benefit of being open source like Python, and has some quite powerful statistics packages. It tends to be quite popular amongst biologists and statisticians. However, I found R less intuitive compared to Python, and Python more suited to the matrix manipulation etc that I was used to with MATLAB. Similarly there’s IDL, which seems to be quite commonly used by those who work in remote sensing, but again it’s not entirely free to use. I have to say that I found IDL easier to work with in terms of syntax, than R, but that’s probably just based on my background.
In an ideal world, I’d want to teach new oceanographers in Python. I’ve not yet found any downsides from the perspective of my work. It seems to be a continually asked for skill in the current job market and has many applications beyond Earth sciences. However, I’m aware that we tend to inherit our programming skills from those who teach us, which is why I began working in MATLAB. Leaving MATLAB was then hard, because few people around me were working in Python. That is changing now, and the online community support for Python is excellent. There are very few problems I haven’t been able to solve with google.
I generally don’t post lots about programming on my blog, since really, there are far better people out there doing this than me. There are great basic tutorials for Python online – including Pythons own beginner guides, and the learn Python tutorials. I found the basic tutorials can be a bit overwhelming though, and get you a bit caught up in some of the details of Python which, while useful and often incredibly powerful, you may not need to use extensively. I found getting some examples of things you would typically do in MATLAB/would like to do with your data, to be the most helpful. Think file reading, selection of parts of matrices of data, basic maths and stats, plotting etc. With regards to oceanography specifically, sites you may want to check out include OceanPython, these links from RSMAS Miami, and many others – just google ‘Python Oceanography’.
I think the best and worst thing about programming, especially with open source languages and as data becomes bigger and more complex, is that your skills must continually evolve. This used to be frustrating to me, as I felt like I was never really improving, until I realised I was actually just pursuing harder and harder tasks. Now I’m enjoying the challenge more. With that said, I’d love to hear from anyone who has any recommendations for great resources on how to make the best of Python – particularly for plotting and satellite data processing.