Gitlab Runner Python

Pytest is used to run unit tests in the Analytics project. The tests are executed from the root directory of the project with the pythonpytest CI pipeline job. The job produces a JUnit report of test results which is then processed by GitLab and displayed on merge requests. Writing New Tests. A native package called gitlab-ci-multi-runner is available in Debian Stretch. By default, when installing gitlab-runner, that package from the official repositories will have a higher priority. If you want to use our package, you should manually set the source of the package.

On this page

  • Python Guide
    • Values
      • Linting
    • Unit Testing

Python Guide

It is our collective responsibility to enforce this Style Guide since our chosen linter does not catch everything.

Values

Campsite rule - As these guidelines are themselves a WIP, if you work with any code which does not currently adhere to the style guide update it when you see it.

Linting

We use Black as our linter. We use the default configuration.

There is a manual CI job in the review stage that will lint the entire repo and return a non-zero exit code if files need to be formatted. It is up to both the MR author and the reviewer to make sure that this job passes before the MR is merged. To lint the entire repo, just execute black . from the top of the repo.

Spacing

Following PEP8 we recommend you put blank lines around logical sections of code. When starting a for loop or if/else block, add a new line above the section to give the code some breathing room. Newlines are cheap - brain time is expensive.

Type Hints

All function signatures should contain type hints, including for the return type, even if it is None. This is good documentation and can also be used with mypy for type checking and error checking.

Examples:

Import Order

Imports should follow the PEP8 rules and furthermore should be ordered with any import ... statements coming before from .... import ...

Example:

Docstrings

Docstrings should be used in every single function. Since we are using type hints in the function signature there is no requirement to describe each parameter.Docstrings should use triple double-quotes and use complete sentences with punctuation.

Examples:

How to integrate Environment Variables

To make functions as reusable as possible, it is highly discouraged (unless there is a very good reason) from using environment variables directly in functions (there is an example of this below).Instead, the best practice is to either pass in the variable you want to use specifically or pass all of the environment variables in as a dictionary.This allows you to pass in any dictionary and have it be compatible while also not requiring the variables to being defined at the environment level.

Examples:

Package Aliases

We use a few standard aliases for common third-party packages. They are as follows:

  • import pandas as pd
  • import numpy as np

Variable Naming Conventions

Adding the type to the name is good self-documenting code.When possible, always use descriptive naming for variables, especially with regards to data type. Here are some examples:

  • data_df is a dataframe
  • params_dict is a dictionary
  • retries_int is an int
  • bash_command_str is a string

If passing a constant through to a function, name each variable that is being passed so that it is clear what each thing is.

Lastly, try and avoid redundant variable naming.

Examples:

Making your script executable

If your script is not able to be run even though you've just made it, it most likely needs to be executable. Run the following:

For an explanation of chmod 755 read this askubuntu page.

Mutable default function arguments

Using mutable data structures as default arguments in functions can introduce bugs into your code. This is because a new mutable data structure is created once when the function is defined, and the data structure is used in each successive call.

Example:

Output:

Reference: https://docs.python-guide.org/writing/gotchas/

Folder structure for new extracts

  • All client specific logic should be stored in /extract, any API Clients which may be reused should be stored in /orchestration
  • Pipeline specific operations should be stored in /extract.
  • The folder structure in extract should include a file called extract_{source}_{dataset_name} like extract_qualtrics_mailingsends or extract_qualtrics if the script extracts multiple datasets. This script can be considered the main function of the extract, and is the file which gets run as the starting point of the extract DAG.

When not to use Python

Since this style guide is for the entire data team, it is important to remember that there is a time and place for using Python and it is usually outside of the data modeling phase.Stick to SQL for data manipulation tasks where possible.

Gitlab Runner Python Program

Unit Testing

Gitlab

Pytest is used to run unit tests in the Analytics project. The tests are executed from the root directory of the project with the python_pytest CI pipeline job. The job produces a JUnit report of test results which is then processed by GitLab and displayed on merge requests.

Writing New Tests

New testing file names should follow the pattern test_*.py so they are found by pytest and easily recognizable in the repository. New testing files should be placed in a directory named test. The test directory should share the same parent directory as the file that is being tested.

A testing file consists of one or more tests. An individual test is created by defining a function that has one or many plain Python assert statements. If the asserts are all true, the test passes. If any asserts are false, then the test will fail.

When writing imports, it is important to remember that tests are executed from the root directory. In the future, additional directories may be added to the PythonPath for ease of testing as need allows.

Exception handling

When writing a python class to extract data from an API it is the responsibility of that class to highlight any errors in the API process. Data modelling, source freshness and formatting issues should be highlighted using dbt tests.

Avoid use of general try/except blocks ie:

Mar 6, 2019 by Thibault Debatty | 12792 views

Whatever programming language your are using for your project, GitLab continuous integration system (gitlab-ci) is a fantastic tool that allows you to automatically run tests when code is pushed to your repository. Java, PHP, Go, Python or even LaTeX, no limit here!In this blog post we review a few examples for the Python programming language.

In GitLab, the different tests are called jobs. These jobs are executed by a gitlab-runner, that can be installed on the same server as you main GitLab, or on one or multiple separate servers.

Different options exist to run the jobs: using VirtualBox virtual machines, using Docker containers or using no virtualization at all. Most administrators configure gitlab-runner to run jobs using Docker images. This is also what we will assume for the remainder of this post.

PyLint

Pylint is a powerful tool that performs static code analysis and checks that your code is written following Python coding standards (called PEP8).

To automatically check your code when you push to your repository, add a file called .gitlab-ci.yml at the root of your project with following content:

test:pylint is simply the name of the job. You can choose whatever you want. The rest of the code indicates that gitlab-runner should use the docker image python:3.6, and run the mentioned commands.

When using some external classes (like sockets), PyLint may complain that the used method does not exist, although the method does actually exist. You can ignore these errors by appending --ignored-classes=... to the pylint command line.

You can also specify a directory (instead of *.py), but it must be a Python module and include __init__.py.

Good to know: Pylint is shipped with Pyreverse which creates UML diagrams from python code.

Unit testing wity PyTest

PyTest is a framework designed to help you test your Python code. Here is an example of a test:

Gitlab

To run these tests automatically when you push code to your repository, add the following job to your .gitlab-ci.yml.

pytest will automatically discover all test files in your project (all files named test_*.py or *_test.py) and execute them.

Gitlab Runner Python Example

Testing multiple versions of Python

Gitlab Runner Python Tutorial

One of the main benefits of automatic testing with gitlab-ci is that your can test multiple versions of Python at the same time. There is however one caveat: if you plan to test your code with both Python 2 and Python 3, you will at least need to disable the superfluous-parens test in Python 2 (for the print statement, that became a function in Python 3). Here is a full example of .gitlab-ci.yml :