The Python Packages You Need to Learn for Hedge Fund Jobs
As we’ve written here before, hedge funds love people who can code in Python. Having started life as a leader in web frameworks, Python has become the go-to language for data scientists. And hedge funds get a lot of their alpha from data.
Few people know more about data science in hedge funds than Jeff Reback, managing director of quantitative hedge fund Two Sigma. Reback, an MIT computer science graduate, has worked at Two Sigma since 2017 and is an expert in big data and electronic trading systems. But there’s one thing Reback knows better than anything else: Pandas, the open-source Python library used for data structures and numeric table manipulation. Reback is Mr Pandas: he has been managing the project since 2013.
In a webinar a few months ago, Reback presented the following chart, reflecting Pandas’ massive growth among Python libraries since it took over. Based on questions posed on Stackoverlow, it reflects Panda’s pre-eminence among data packages across all industries, not just finance.
Source: Two Sigma
However, Panda’s growth has plateaued since 2020. And that’s because Pandas is awesome, but not foolproof. It’s very easy to debug and it’s very easy to test, but it’s not great once you get to over 10 gigabytes of data. At 10 gigabytes and above, Pandas is less efficient and has memory constraints.
So at this point, Reback says Two Sigma is seamlessly moving on to something else: Ibis, another open-source Python package designed for very large datasets. Ibis is not on the graph above. Like Pandas, Ibis was designed by Wes McKinney, a former quantitative researcher at hedge fund AQR. McKinney himself detailed all of Panda’s flaws and his reasons for inventing Ibis in 2017.
So nowadays you don’t just need to know about pandas. You must know pandas and Ibis. Reback says Two Sigma has built a technology stack, “Bamboo,” that uses Pandas at its core for smaller datasets, and uses Ibis to translate its code to Apache Spark for larger datasets. “It’s super cool, write the code once, test it, get it working, then scale it with no problem,” says Reback.
For the moment, Pandas is by far the most used of the two libraries: it has 35,000 stars on Github against 2,000 on Ibis. But as data proliferates, Ibis is the future. Data scientists who want to work in hedge funds need to know both.
Click here to create an eFinancialCareers profile. Make yourself visible to companies that hire exceptional Python programmers in hedge fund jobs.
Have a confidential story, tip or comment you’d like to share? Contact: [email protected] first. WhatsApp/Signal/Telegram also available (Telegram: @SarahButcher)
Be patient if you leave a comment at the bottom of this article: all our comments are moderated by human beings. Sometimes these humans may be asleep or away from their desks, so your comment may take a while to appear. Eventually, it will – unless it’s offensive or defamatory (in which case it won’t.)
Photo by Debbie Molle on Unsplash