My Experience Contributing to Open Source - Emptor

Contributing to open source is a fun way to learn and improve your coding skills – and you get to do so by helping others. Open source communities are often open to new participants in a learning environment; this makes the experience of contributing joyful.

A good way to decide where to contribute is to choose open source software that you use daily. In that sense, if you know the software that you are about to contribute to, you can easily identify where help is needed and where you can add new and interesting features.

In this article, I will detail contributions that I made and demonstrate more or less the workflow of contributing to Open Source. The projects I will focus on are the following:

scrapy: Python web crawling and scraping framework.
jrnl: Text journal application for the command line.
polybar: A fast and easy-to-use tool for creating a status bar.

Scrapy | Add a New Extension to Check Settings Names

Scrapy is a very customizable tool; one of the main ways to customize it is through settings. Scrapy settings allow you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines, and spiders themselves. The infrastructure of the settings provides a global namespace of key-value mappings that the code can use to pull configuration values from.

So, for example, if DNSCACHE_ENABLED is set to True, our spider will enable DNS in-memory cache. Scrapy has a LOT of configurable variables that are prone to typo errors that did not provide enough feedback for new users to discover this trivial error.

This contribution tries to solve that problem by creating a new extension. Initially, a linter was proposed for the issue, but after some discussion, the consensus was that a runtime approach would be better. The extension finds unused settings by reading an attribute (has_been_read) in the settings dictionary and, if possible, will suggest a possible replacement. Another advantage of this runtime approach is that it will also find unused settings that possibly are not misspelled settings.

PR: Add a new extension to check settings names

Scrapy | Add Failed and Success Count Stats to Feedstorage Backends

Scrapy allows users to specify how to export extracted data (e.g., .xml, .json, .csv, etc.) and also where to save it (the local filesystem, S3, Google Cloud, an FTP server, standard output). However, those storage backends did not save performance information at the end of the run. A good place to save this information is the statistics that Scrapy generates during the run. These stats can be used to measure how the run performed, as they contain stats about memory usage, the finish timestamp, the starting timestamp, etc.

The idea of this PR is to add a new stat that will help users to know if the storage backend encountered some problems while saving. For example, if a spider saves to S3 and the local filesystem, but the S3 credentials were wrong, this stat will be presented to the user:

{
  "elapsed_time_seconds": 11.61577,
  "feedexport/failed_count/S3FeedStorage": 2,
  "feedexport/success_count/FileFeedStorage": 2,
  "finish_reason": "finished"
}

PR: Add failed and success count stats to feedstorage backends

jrnl | Add Default Display Format Option to Config File

jrnl supports a wide variety of formats (Markdown, JSON, YAML, etc.); however, the usability for this feature was not the best. That is because if you want to (for example) print the last eight entries in Markdown format, you can use this command:

jrnl -8 –export md

That could be tedious for users who want to print as Markdown every time because they must add the –export md option continually. In order to avoid that annoyance, this contribution adds a new option to the configuration file: display_format, which is an option that can be set to any of the exporters that jrnl has.

PR: Add default display format option to config file

Polybar | Remove Upper Bound to get_volume

Polybar has a feature to show a system’s actual volume. The problem is that it does not “recognize” when the volume goes beyond 100%. This is a possibility that sound systems such as PulseAudio give you, allowing you to increment volume beyond 100% to 150%.

This minor feature only requires avoiding clamping the volume between [0, 100]. Maybe the “hardest” part was compiling the code and testing the changes.

PR: Remove upper bound to get_volume

Conclusion

Knowledge obtained from discussions with maintainers, getting more involved with projects that were interesting to me, and the welcoming communities are some of the major reasons I will seek to contribute to open source in the future.

What are your thoughts on open source?