Skip to content

Good coding and dataset storage practices

You know that feeling when you open the folder of a student after they’ve spent a few months in the lab… and all you see are files named test1.mat, ramantest_powerlaser3.wdf, nirtest.raw, untitled2.m, new_script2_final.py, and copy_of_new_script_FINALdefinitely.txt?

Honestly, we can all agree this should be illegal (maybe even punished with jail time).

Jokes aside😄, clean code and organized datasets also shows professionalism and respect for others. It proves that you care about quality and understand that teamwork (and science!) works better when we can actually read each other’s functions and data.

Lovelace’s Square wants to build a repository of code and data that is easy to use, well-organized, and powerful. Since we are building a shared repository and because this repository may later support Ada or other machines, it’s even more important to keep everything clear, clean, and consistent. A well-structured codes and datasets helps Ada give better answers, and ensures that both people and machines can navigate the content without confusion.

By following some guidelines, you’ll not only make your fellow coders smile, but also help keep our shared space neat and useful. Importantly, although these guidelines are not mandatory, we strongly encourage you to follow them.


General Principles

Before going into details, it is important to understand some basic principles that will help us keep both code and datasets clear, useful, and easy to share.

  • Be consistent: Use a consistent structure and naming style across your files. This makes it easier to navigate, reduces confusion, and improves collaboration.

  • Make it clear and readable: Choose names that describe the content. Use clear function and variable names in code. In datasets, label columns properly and include units if needed. Avoid vague names like x, temp, or data1.

  • Stay organized: Keep related items together. Use folders for different types of files (e.g. raw/, processed/, scripts/, results/). Name your files so others can understand what they are without opening them.

  • Document your work: Always include a short explanation. For code, use comments and headers to describe what the function does and how to use it. For datasets, include a README.txt or metadata file to explain where the data comes from and what each column means.

  • Be efficient, but not at the cost of clarity: Write clean, simple code and avoid unnecessary complexity. For data, avoid large, unused files. Always balance performance with readability and usability.

  • Use open and common formats: Save data in formats like .txt, .csv, or .xlsx. Write code using standard tools (e.g. MATLAB, Python, R) and avoid formats that are hard to open or require very special software.

  • Follow FAIR principles: Add clear descriptions, use standard formats, and ensure others can use your work without needing to ask you directly.

FAIR stands for:

  • Findable: Others can find your work easily thanks to good naming, documentation, and metadata.
  • Accessible: Your code and data are available to others (open source, no hidden folders or password-protected ZIPs).
  • Interoperable: Your work should be usable with other tools, platforms, and programming languages. This means using open, standard, or widely supported formats (like .csv, .py, .xml, or .m) and writing code that follows common practices. Whether you’re working in MATLAB, Python, R, or elsewhere, aim to make your code and data easy to integrate with other systems, tools, or workflows.
  • Reusable: Your code should be understandable, well-documented, and clearly licensed, so others can actually use it, not just today, but in future projects or by different teams. Don’t be afraid to add comments, examples, or explanations. Clear code is kind code.

FAIR principles help turn your one-time analysis into something bigger: a resource others can build on, verify, or even teach with.


So with all that in mind, we’ve collected some key tips to help you write code that not only works well, but will still make sense after a few months. The examples in this documentation are in MATLAB programming, but they can be extended to other languages.

Tips for codes

Documentation

Good documentation is one of the most valuable parts of a project. It helps others understand your code and makes it easier to reuse, share, and improve.


Function headers

We strongly recommend to use this template for your function headers. It gives users everything they need to understand how the function works.

function [output1, output2] = FunctionName(input1, input2)
% FUNCTIONNAME Brief description of the function
%
% Authors: Your Name
% Date Created: YYYY-MM-DD
% License: Specify your license here
% Version: Specify the version.
% Reviewed by Lovelace's Square team: Yes/No
%
% Detailed function description:
% Here's where you can really shine! Explain what your function does,
% any algorithms it uses, and any quirks or special features.
%
% Args:
% input1 (type): Description of the first input parameter.
% input2 (type): Description of the second input parameter.
%
% Returns:
% output1 (type): Description of the first output parameter.
% output2 (type): Description of the second output parameter.
%
% Example:
% [result1, result2] = FunctionName(arg1, arg2)
%
% See also: RELATEDFUNCTION1, RELATEDFUNCTION2
% Your brilliant code goes here
end

README files

Every project or main folder should include a README.txt file. It explains what the code does, how to use it, and how to install or run it.

  • DirectorymyProject/
    • README.txt
    • mainScript.m
    • Directoryfunctions/
      • calculateSomething.m
      • plotResults.m

Version control

Version control is your time machine for code. Use it to track changes, collaborate smoothly, and recover older versions when needed.

Version control is your time machine for code. It lets you track changes, collaborate smoothly, and bring back old versions when things go wrong.

  • Commit regularly: Make small, frequent commits with clear and descriptive messages. This helps you and your collaborators understand the history of changes.

  • Use branches: Create separate branches for features, fixes, or experiments. This keeps the main branch clean and reduces the risk of breaking important code.

  • Test before pushing: Make sure your code runs correctly before pushing to shared repositories. Avoid adding broken or incomplete code to the main branch.


Code style

Indentation

Use consistent indentation: Indentation makes your code easier to follow. Use 4 spaces per level. This helps show where blocks of code start and end.

if isValid
result = processData(data);
else
result = [];
end

Line length

Keep lines under 75 characters. Break long lines using ... to improve readability.

summaryText = ['The analysis shows a strong correlation ' ...
'between variables.'];

Spaces around operators

Add spaces around operators to make your code easier to read.

result = (a + b) / 2;
count = count + 1;

Use blank lines

Separate logical sections with blank lines to improve clarity.

data = load('datafile.mat');
% Normalize the data
normData = (data - mean(data)) / std(data);
% Plot the result
plot(normData);

Align similar lines to make code easier to scan.

maxIterations = 100; % maximum number of iterations
maxDepth = 20; % maximum depth
minError = 0.01; % minimum error threshold

Use parentheses to clarify logic

Use parentheses in complex expressions to show order of operations clearly.

result = (a * b) + c - (sqrt(d) * e);

Commenting

Use comments to explain what your code is doing, especially for complex logic.

% Normalize the input values
normData = (data - mean(data)) / std(data);
% Check for outliers
isOutlier = normData > 3;

Descriptive names

Avoid using single letters or vague terms. Use names that explain what the variable or function does.

velocity = 12.5;
result = calculateMean(data);

Functions and classes: UpperCamelCase

Start each word with a capital letter. Do not use underscores.

MyAwesomeFunction()
ChemometricAnalyzer()

Variables and properties: lowerCamelCase

Start with a lowercase letter. Capitalize the first letter of each new word.

myVariable = 5;
dataMatrix = rand(10);

Constants: UPPERCASE

Write constants in all caps to show they are fixed values.

MAX_ITERATIONS = 100;
DEFAULT_TIMEOUT = 30;

Functions and scripts

Functions and scripts are the basic parts of your code. Here is how to use them well:

One function per file. Each function should be in its own file. This makes it easier to find, understand, and update.

  • NormalizeData.m
  • CalculateMean.m
  • LoadAndPlot.m

Use functions instead of scripts. Functions are more reliable because they keep their own variables. This avoids conflicts and makes the code cleaner and easier to test.

function result = CalculateMean(data)
result = sum(data) / numel(data);
end

Use sections in long scripts. If your script is long, break it into sections using %%. This helps you run or debug parts of the code without running everything at once. It also makes your script easier to read and follow.

%% Load Data
data = load('datafile.mat');
%% Process Data
normData = (data - mean(data)) / std(data);
%% Plot Results
plot(normData);

Group related functions in packages or toolboxes. When functions are related, put them in the same folder or package. This keeps your project organized and makes it easier to reuse or share.

  • Directory+preprocessing/
    • normalizeData.m
    • scaleData.m
  • Directory+analysis/
    • calculateMean.m
    • calculatePCA.m

Tips for datasets

Datasets are as important as code. Here is how to prepare and store data so it is clear, reusable, and reliable.


Organization

Keep your data files in a clear folder structure. Group related files together and name folders meaningfully.

  • Directorydata/
    • Directoryraw/
      • sample1.csv
      • sample2.csv
    • Directoryprocessed/
      • sample1_clean.csv
      • sample2_clean.csv
    • Directorydocs/
      • metadata.json
      • README.txt

Documentation & Metadata

Every dataset should include clear documentation and metadata so users know what the data is, where it came from, and how to use it. At minimum, each data folder needs:

  1. README file (e.g. README.json or README.txt) – An overview of the dataset:

    • Title and Brief description of the data
    • Source: Where the data originated (e.g. instrument, study, or publicly available repository)
    • Directory structure: What files or subfolders exist and what they contain
    • Usage: How to load or process the data, including any required software or dependencies
    • Contact: Who to reach if there are questions
  2. Metadata file (e.g. metadata.json or metadata.txt) – A machine-readable record:

    • File names and column descriptions (with units)
    • Date collected and author/institution
    • Format and version of the dataset
    • License or usage terms

Example README.txt:

# Infrared Spectra Dataset
## Description
This dataset contains raw and processed infrared spectra of chemical samples, collected using Fourier-transform infrared (FTIR) spectroscopy. It was created as part of a study on solvent concentration and spectral variation in analytical chemistry.
## Source
- Instrument: FTIR Model X
- Location: Lovelace Lab, Lovelace University
- Date Collected: March 2025
- Experiment: Solvent concentration and spectral profile study
## Authors
- Dr. Ada R. Lovelace (ada.lovelace@lovelacesquare.org)
- Clara Babbage
- Team Lovelace's Square – Spectroscopy & Chemometrics Unit
## Publication
If you use this dataset, please cite:
A. R. Lovelace, C. Babbage. "Spectral Response to Solvent Concentration in FTIR." Journal of Chemometric Data, 2025.
DOI: 10.1234/jcd.2025.0456
## Structure
data/
├── raw/
│ ├── sample1.csv # unprocessed FTIR spectra
│ └── sample2.csv
├── processed/
│ ├── sample1_clean.csv # baseline corrected, smoothed
│ └── sample2_clean.csv
└── docs/
├── README.md
└── metadata.json
## Format
- Files are in .csv format with two columns:
- wavenumber (cm⁻¹)
- intensity (absorbance)
## Usage
raw = readtable('data/raw/sample1.csv');
clean = readtable('data/processed/sample1_clean.csv');
plot(raw.wavenumber, raw.intensity);
## License
This dataset is licensed under CC BY 4.0 – you are free to use, modify, and share it, as long as you give proper credit.
## Contact
Questions? Email lab@lovelacesquare.org

Example metadata.json:

metadata_json = """{
"sample1.csv": {
"date_collected": "2025-03-10",
"instrument": "FTIR Model X",
"location": "Lovelace Lab, Lovelace University",
"experiment": "Solvent concentration and spectral profile study",
"columns": {
"wavenumber": "cm^-1",
"intensity": "absorbance"
},
"notes": "Raw spectral data collected without preprocessing."
},
"sample1_clean.csv": {
"date_processed": "2025-03-12",
"processing_steps": [
"baseline correction",
"smoothing",
"normalization"
],
"software": "MATLAB R2025a",
"columns": {
"wavenumber": "cm^-1",
"intensity": "absorbance (processed)"
},
"notes": "Processed version of sample1.csv ready for analysis."
}
}

FAIR for Data

Apply FAIR principles to your datasets:

  • Findable: Add clear names, README, and metadata.
  • Accessible: Use open formats and avoid paywalled locations.
  • Interoperable: Follow common standards (e.g. CSV, JSON, .py, .m).
  • Reusable: Include license information and usage examples.

Let’s keep Lovelace’s Square neat and useful!

Writing clean code and keeping data organized isn’t just about following rules. It shows respect to your team, helps you find your work easily later, and encourages others to build on what you’ve done. When we all do our part, Lovelace’s Square becomes a space where great science happens easily.