Project creation
First you have to create the project into one of the gitlab groups depending on your projects. You have to edit the wiki to add your project in the list.
Project folders
Code
The functions must be put into subfolders inside the src folder.
The launch script or notebook must be put at the root of the project.
Data
No data should be placed inside the git folder.
The datas are shared between user and must be put inside the /home/data/ folder (for host) or /data/ inside container.
It must have the same path than the code: /home/data/<group-gitlab>/<project-gitlab>
Most of the time the processing will have this steps: raw data from client ==> some processing generating intermediate files ==> final files used by analysis ==> statistics files / powerpoints
This data folder structure permits to respect that:
.
├── inputs # All inputs used
│ ├── interim # intermediate files: can be deleted
│ ├── processed # final files: used by analysis
│ └── raw # original files sent by the client or downloaded
└── outputs # All outputs send to client
Add basic files
.gitignore
File to hide some system files from git. A template one is available here for Python.
Makefile
A file to centralize all commands to launch.
clean:
rm -rf dist/
rm -rf build/
rm -rf *.egg-info
find . -name '*.pyc' -exec rm -f {} +
find . -name '*.pyo' -exec rm -f {} +
find . -name '*~' -exec rm -f {} +
find . -name '__pycache__' -exec rm -fr {} +
find . -name 'spark-warehouse' -exec rm -fr {} +
rm -fr .tox/
rm -f .coverage
rm -fr htmlcov/
rm -rf .pytest_cache/
flake8: clean
flake8 --max-line-length=100
.gitlab-ci.yml
A file to define some task to be launch by Gitlab integrated CI/CD.
Here is an exemple to deploy the code into the server and run a python linter at each commit :
stages:
- test
- deploy
default:
image: alpine
before_script:
- apk add make
- mkdir -p ~/.ssh && chmod 700 ~/.ssh
- echo "$SSH_PMP1_PRODUCTION_KNOWN_HOST" >> ~/.ssh/known_hosts
- chmod 644 ~/.ssh/known_hosts
- "which ssh-agent || ( apk add --update openssh )"
- eval $(ssh-agent -s)
flake8:
image: cwaysdockerhub/vsc
stage: test
before_script:
- make clean
script:
- make flake8
deploy_main_pmp1:
stage: deploy
script:
- echo "$SSH_PMP1_PRODUCTION_PRIVATE_KEY" | tr -d '\r' | ssh-add - > /dev/null
- ssh pmp-production@pmp1.pmplab.io "cd /home/pmp-production/<gitlab-group>/<gitlab-project>/
&& git fetch
&& git checkout main
&& git pull
&& exit"
only:
- main
README.md
The idea is to have the smallest README file to minimize the work to be up-to-date. Here is a proposition of section to fill:
# <Project name>
Short description: one sentence.
## Components
1. List of components
## Usage
Description of usage for each of the components (for the final user).
What are the commands to launch ?
## Installation
If needed, steps to install and configure the application for the end user.
## Development
Here are the description for devs to improve / update the app.
### Code folder architecture
Result of the command tree -d -L <max_level> with a small description for each file.
### Descriptions of action to create a new thing if needed