Help me with my first OO project

Hi everyone,

I started working on my first "real" object-oriented project. It is supposed to be a tool that allows me to easily perform various experiments in my daily work as an applied maths PhD student.

I would like to get some feedback, and also ask for guidance on how to increase its utility (I want to turn it into something that could easily be picked up on by other students as well).

Before I post the code, a high-level summary of what it is about.
The core of the project is a collection of sampling methods that draw samples from (continuous) probability densities.

For the less mathematical/physical people here, it is easier to think in terms of a particle moving across a landscape with hills and valleys.
I start at one position on this landscape, with a given velocity. Due to the slopes of the many hills in the landscape, the particle experiences forces and accelerates or decelerates, changes directions, and so on. The force that the particle experiences is of course dependent on its position. How the particle changes its movement based on the current force is governed by the sampling schemes themselves.
The samples that I take are the sequence of positions and velocities that result. Or, rather, I typically compute and store other quantities based on those positions and velocities.

If I develop new sampling schemes, I want to compare their performance on various problem classes, i.e. various different landscapes in the language from above. I want to have a framework where it is as simple as possible to define a new problem/landscape, and then run my various sampling schemes on it.

The schemes themselves should be "blind" to the landscape at hand, as well as to the kind of measurements that need to be obtained along the way (i.e. what kind of quantities I want to compute from my positions and velocities).

The user should only define the landscape, as well as the specifics of the measurement. They then should simply call one of the various sampling routines.

I think, in the language of software development, what I try to do is write a sampling scheme library and a corresponding API to use it? Not sure...
I now post the code. Afterwards I name a few design issues I see.
Note that I will be using mpi to draw different trajectories on different processors, but this is not implemented yet.
Note also that the code uses the word "parameter" instead of "position".

An example main-file. This is what every user needs to set up himself.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include "samplers.h"
#include "setup_classes.h"

int main(int argc, char *argv[]){

    // we want to sample the double well potential using the OBABO sampler


    double T = 1;           // sampler hyperparameters
    double gamma = 1;
    double h = 0.01;

    int N = 50000;          // no. of iteration
    bool tavg = false;      // perform time-average?
    int t_meas = 5;         // take measurement every t_meas iterations (passed to sampler as well as print functions).

    OBABO testsampler(T, gamma, h);     // construct OBABO object defined in header "samplers.h"


    PROBLEM double_well;    /* construct object of the problem class defined by the user in header "setup_classes.h". */
    
    measurement results = testsampler.run_mpi_simulation(N, tavg, double_well, t_meas);  /* run sampler. "measurement" needs to be defined 
                                                                                            by user in header "setup_classes.h". It holds
                                                                                            information of what quantities need to be obtained
                                                                                            by the sampler and how to compute and print them.  */

    results.print_to_csv(t_meas);     // print to file, as specified by the user in the "measurement" class.

    return 0;


}


The library, the samplers themselves, are given in the header "samplers.h" Right now, there is just one, the OBABO sampler.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#ifndef SAMPLERS_H
#define SAMPLERS_H


#include <cmath>
#include <iostream>
#include <vector>
#include <random>
#include <string>
#include <sstream>
#include <chrono>
#include <math.h> 
#include <iomanip>
#include <algorithm>
#include <limits>
#include <numeric>
#include <iterator>
#include "setup_classes.h"
// #include <mpi.h>


/* The OBABO sampler. It requires the "PROBLEM" class and the "measurement" class in header "setup_classes.h".
   They need to be written by the user and define the sampling problem and what measurements to take. 
   Note that the member functions below require those classes to be structured in a certain way. */

class OBABO{

    private:
        const double T;
        const double gamma;
        const double h;

        measurement collect_samples(const int N, const bool tavg, PROBLEM POTCLASS, const int randomseed, const int t_meas);  // draws a single sampling trajectory

    public:
        // constructors
        OBABO(double T, double gamma, double h): T{T}, gamma{gamma}, h{h} {
        } 

        measurement run_mpi_simulation(const int N, const bool tavg, PROBLEM POTCLASS, const int t_meas);  /* sets up mpi environment and calls "collect_samples" 
                                                                                                              on each process within. Also performs averaging. */

        void print_sampler_params();    // print sampler hyperparameters.

};

#endif // SAMPLERS_H 


The implementation of the samplers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#include "samplers.h"



void OBABO::print_sampler_params(){
    
    std:: cout << "Sampler parameters:\n";
    std:: cout << "Temperature = " << T << ",\nFriction = " << gamma << ",\nStepsize = " << h << ".\n" << std:: endl; 

};


measurement OBABO::run_mpi_simulation(const int N, const bool tavg, PROBLEM POTCLASS, const int t_meas){
    
    int seed = 1;
    print_sampler_params();

    measurement RESULTS = OBABO::collect_samples(N, tavg, POTCLASS, seed, t_meas);

    return RESULTS;

};



measurement OBABO::collect_samples(const int N, const bool tavg, PROBLEM problem, const int randomseed, const int t_meas){

    std:: cout << "Starting OBABO simulation...\n" << std:: endl;

    // set integrator constants
    const double a = exp(-1*gamma*h);    
    const double sqrt_a = sqrt(a);
    const double sqrt_aT = sqrt((1-a)*T);
    const double h_half = 0.5*h;   

    size_t No_params = problem.parameters.size();  // number of parameters

    // COMPUTE INITIAL FORCES
    problem.compute_force();

    // COMPUTE INITIAL MEASUREMENTS
    measurement RESULTS;
    RESULTS.take_measurement(problem.parameters, problem.velocities);

    // PREPARE RNG
    std:: mt19937 twister;
    
    std:: seed_seq seq{1,20,3200,403,5*randomseed+1,12000,73667,9474+randomseed,19151-randomseed};
    std:: vector < std::uint32_t > seeds(1);
    seq.generate(seeds.begin(), seeds.end());
    twister.seed(seeds.at(0)); 

	std:: normal_distribution<> normal{0,1};
    double Rn;


	auto t1 = std:: chrono::high_resolution_clock::now();

    // MAIN LOOP -  from here, we start our journey across the "landscape"
    for ( size_t i = 1;  i <= N;  ++i ) {

        // O + B steps
        for ( size_t j = 0;  j < No_params;  ++j ) {                  
            
            Rn = normal(twister);
            problem.velocities[j] = sqrt_a * problem.velocities[j]  +  sqrt_aT * Rn  +  h_half * problem.forces[j]; 
        
        }

        // A step
        for ( size_t j = 0; j < No_params;  ++j ) {

            problem.parameters[j] += h * problem.velocities[j];
        
        }	
  
        // COMPUTE NEW FORCES
        problem.compute_force();

        // B STEP
        for ( size_t j = 0; j < No_params;  ++j ) {

            problem.velocities[j] += h_half * problem.forces[j];
        
        }   
	
        // O STEP
        for ( size_t j = 0; j < No_params;  ++j ) {
            
		    Rn = normal(twister);
            problem.velocities[j] = sqrt_a * problem.velocities[j]  +  sqrt_aT * Rn;
        
        }   								 

        // TAKE MEASUREMENT
		if( i % t_meas == 0 ) {                                                 // take measurement any t_meas steps.        
            RESULTS.take_measurement(problem.parameters, problem.velocities); 
		}
		
        if( i % int(1e5) == 0 ) std:: cout << "Iteration " << i << " done!\n";
	
	}  // END MAIN LOOP

    // FINALIZE
    auto t2 = std:: chrono:: high_resolution_clock:: now();
	auto ms_int = std:: chrono:: duration_cast < std:: chrono:: seconds > (t2 - t1);
	std:: cout << "Execution took " << ms_int.count() << " seconds!\n ";
        
    return RESULTS;
};


Note that the samplers require classes "PROBLEM" (which defines the landscape) and "measurement" which defines what measurements to collect.
Both of those need to be given by the user in the header "setup_classes", to be seen below.

The file "setup_classes.h"
Currently, the "landscape" is a simple double well in two dimensions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
#ifndef SETUP_CLASSES_H
#define SETUP_CLASSES_H

#include <vector>
#include <cmath>
#include <math.h>
#include <fstream>

constexpr double PI = 3.141592653589793;    // some useful constants
constexpr double eps = 1e-12;

class PROBLEM { 
    
    // THIS PROBLEM IS THE DOUBLE WELL POTENTIAL IN 2 DIMENSIONS
    
    /* In case one wants to change the problem to something else, eg. harmonic oscillator:
       The samplers require a "compute_force" function as well as the vectors "parameters", "velocities", and "forces"
       storing the corresponding quantities. */

    public:

        // parameters, velocities, forces 
        std:: vector <double> parameters {1,0};       // CARE! THE SAMPLERS REQUIRE THIS VECTOR
        std:: vector <double> velocities {0,0};       // CARE! THE SAMPLERS REQUIRE THIS VECTOR
        std:: vector <double> forces {0,0};           // CARE! THE SAMPLERS REQUIRE THIS VECTOR

        // fill force vector
        void compute_force(){                           /* CARE! THE SAMPLERS REQUIRE THIS FUNCTION. 
                                                           It needs to be written by the user. */

                 
            double x = parameters[0];       // current parameters (positions)
            double y = parameters[1];

            double e1, e2, rho;             // some help constants
            double x_mu_diff1 = x-mu1[0]; 
            double x_mu_diff2 = x-mu2[0]; 
            double y_mu_diff1 = y-mu1[1]; 
            double y_mu_diff2 = y-mu2[1];
            
            // the exponentials 
            e1 = exp( -inv_two_det_SIG1 * ( SIG1[2]*x_mu_diff1*x_mu_diff1 - 2*SIG1[1]*x_mu_diff1*y_mu_diff1 + SIG1[0]*y_mu_diff1*y_mu_diff1 ) );
            e2 = exp( -inv_two_det_SIG2 * ( SIG2[2]*x_mu_diff2*x_mu_diff2 - 2*SIG2[1]*x_mu_diff2*y_mu_diff2 + SIG2[0]*y_mu_diff2*y_mu_diff2 ) );

            rho = pref1 * det1 * e1  +  pref2 * det2 * e2;  // the density

            //force part without noise, i.e. double well
            forces[0] = -1/rho * ( pref1 * e1 * (SIG1[2]*x_mu_diff1 - SIG1[1]*y_mu_diff1)   +   pref2 * e2 * (SIG2[2]*x_mu_diff2 - SIG2[1]*y_mu_diff2) );
            forces[1] = -1/rho * ( pref1 * e1 * (-SIG1[1]*x_mu_diff1 + SIG1[0]*y_mu_diff1)  +   pref2 * e2 * (-SIG2[1]*x_mu_diff2 + SIG2[0]*y_mu_diff2) );

            // if(sig!=0){
            //     normal_distribution<> normal{0,sig};	// add noise
            //     F.fx += normal(twister);
            //     F.fy += normal(twister);
            // }

            return;

        };


    private:

        // constants that define the potential
        const std:: vector <double> mu1 {1, 0};		                // (x,y) of the first well 
        const std:: vector <double> mu2 {-1, 0};		            // (x,y) of the second well
        const std:: vector <double> SIG1 {0.6, 0.085, 0.02};	    // elements of the two covariance matrices sig11, sig12, sig22 (note: sig12=sig21)
        const std:: vector <double> SIG2 {0.6, 0.085, 0.02};
        const double phi1 = 0.5;                                    // mixing factor (phi2 = 1-phi1)	

        // constants used by the force computation 
        const double det1 = SIG1[0]*SIG1[2] - SIG1[1]*SIG1[1];      // determinants of cov. matrices
        const double det2 = SIG2[0]*SIG2[2] - SIG2[1]*SIG2[1];      
        const double inv_two_det_SIG1 = 1 / (2*det1) ;              // used in the exponents in the force
        const double inv_two_det_SIG2 = 1 / (2*det2) ;
        const double pref1 = phi1 / ( 2*PI*pow( det1, 1.5) );  	    // prefactors in the force
        const double pref2 = (1-phi1) / ( 2*PI*pow( det2, 1.5) );


};



class measurement{

    /* The measurement class that defines what quantities are collected by the samplers, how they are computed, and
       printed to a file. This class needs to be modified by the user.
       In this example, we only save the first parameter (corresponding to the x-coordinate
       when used in the double well problem above), as well as the kinetic energy. */

    public:

        void take_measurement(std:: vector <double> parameters, std:: vector <double> velocities){  /* CARE!!! The samplers need this function.
                                                                                                       It needs to be written by the user. */
            
            x.push_back(parameters[0]);
            kin_energy.push_back( 0.5* (velocities[0]*velocities[0] + velocities[1]*velocities[1]) );
        
        }

        void print_to_csv(const int t_meas){    /* print results to file. this routine would be called in the main-file. 
                                                   It needs to be written by the user. "t_meas" gives the number of sampler iterations 
                                                   between two measurements (it is passed here to get the correct iteration count in the
                                                   output file.)*/

            std:: ofstream file{"TEST.csv"};
            std:: cout << "Writing to file...\n";

            for ( size_t i = 0; i<x.size(); ++i )
            {
                file << i*t_meas << " " << x.at(i) << " " << kin_energy.at(i) <<  "\n";  
            }
            file.close();

        }

        std:: vector <double> x; 
        std:: vector <double> kin_energy;



};


#endif // SETUP_CLASSES_H 


I implemented the methods of the "PROBLEM" and "measurement" classes inline because I don't want to burden the users with changing two files (the header and the implementation). I imagine I write a simple github tutorial on how to edit this file if one wants to change the problem/landscape or the measurement.

What is problematic:
Assume now that, in the "setup_classes.h" file, I want to change my problem from a double well to something else (say, a triple-well, ha) i.e. I need to change the PROBLEM class. As mentioned in the comments, no matter the PROBLEM class, all of them will need certain members (parameters, velocities, forces, albeit with different sizes), as well as a method to compute the position-dependent force (as it will be called by the samplers). Therefore, the user needs to be made aware not to mess this up (eg. by forgetting to include the force computation function). I do this with big "CARE!!!" comments now. The rest needs to come from the manual I guess...

However, the sampling library expects the class's name to still be "PROBLEM". That means I would need to comment out the current "PROBLEM" class and replace it with my new version.
What I would rather have is two different classes, one called "double_well", the current one, and the new one called "triple_well", and the samplers should be fine with receiving either of them.

Am I right in assuming that these things can be accomplished with class templates? Or how would one handle this? Can the concepts of polymorphism or inheritance somehow help me? I would need to learn them first...
Are there any other things I could do to increase the utility of the whole thing? Note that this is just a side-project, I do not have time to create a GUI or something like that (not that I knew how to do that anyway lol).

Thanks for reading, would appreciate any input!
Last edited on
Hi PhysicsIsFun,
If no one has replied to your problem, it is because the problem statement is not clear. If you present the problem in a more concise form, i think you will get the help that you need. I am not from a physics background and as you are aware, most people are here are not. Maybe, you present the full problem statement with
- What you intend to achieve
- What you receive as input
- What you get as output
- Any constraints for the inputs?
- Any constraints for the output?
- Any specific logic or formulae to consider?

It is generally difficult to interpret code without domain knowledge and proper problem description.

If you have already figured out your solution, then fine. Otherwise, try to provide more details.

Regards
I do have a slight physics background, but nothing like your level .. the first 2 or 3 college classes (calc based) and 15 years working on drones ( planes & boats ). Nothing useful at the atomic level :)

All the above, lets go back to basics and ignore your specific problem.
-names. Jargon names have a place, eg dx or alpha, in some contexts, but the more readable you make your names, the better your code will be.
class OBABO{ //what in the world is this? no one but you will ever know.

or, for another example:
int N = 50000; // no. of iteration
why not
int max_iterations{50000}; //does not need a comment anymore.
why? In large projects, the user does not have to memorize what N means or keep looking it up to read the comment, the comment is IN THE NAME. Its a little more typing as you crank it out, but its very worth it 5 years from now when you publish a paper and other people have to see the code.

look at it like this. You are on page 157 of c++ code in the middle of trying to understand the simulation, and you see this:
for(int i = 0; i < N; i++)
{
bunch of random things
}
vs this:
for(int i = 0; i < max_iterations; i++)
{
bunch of random things
}
which one is easier to follow?

single letter variable names, cryptic variable names, try to avoid them. I understand, and argue FOR, having short names SPECIFICALLY for math variables in well documented math heavy code. Eg x is x, if you are doing some f(x) thing. Fine. t is time and dt is a time slice in a simulation, fine. But if you are going to do that, you need to make sure that each and every one is crystal clear what it means and refers to. If there is any chance of confusion, use more letters. Outside the math code that closely follows known formulae, use longer names, please (and you are doing this mostly, but N and the class name immediately stand out as hard to make sense of).

use <cmath> not <math.h>. math.h is from C and is missing a few things that C++ offers.

pi is already defined in c++, use that one for consistency across platforms and to meet reader expectations.

None of that is more than housekeeping, though.
As far as feedback, it looks like you are trying to make a library.
Things like print sampler should be virtual, so the user can over-ride your console output and put it up in a gui or send it to a printer or whatever.
all input and output should be configurable this way, letting the user craft their own replacements.

when providing objects for future use, its a pain, to be honest. You have to step outside your vision of 'this is how that will be used' and imagine that someone is trying to use it in another way. Will your MPI work be stable if they have a vector of OBGYN objects? What other weird things can they do that might break things?
If you're doing something that requires others to change, then:
1. Which piece(s) do you want to let them change, and which not?
2. How are they going to change them? If they're not programmers, a UI is usually the only possible way, and if they are programmers, try to make your design suitable for some additional code. The best way is to provide APIs in your code and others only have to call them, as if your code is a complete software.
3. Does 'change' mean that the users have to rewrite what you have written? If so, try to turn that part of your code into an example, not a direct implementation. If not, then make sure they know what do they have to give - datas, programs or user-friendly inputs are all OK.
Hi guys,

thank you for your responses! I realize that I posted too much code and did not give structured enough explanations of what I want to do.

I might give it another try as soon as I go back to this work and need further help. I think I already have an idea of how to proceed.
A suggestion when dealing with a lot of code (a lot is a relative term) and wanting others to code review and offer revision help.

Create a git repository. Github is one repository service that is used a lot.

Having a git repository makes it easier for other people to check out your code. And if you allow it people who have cloned your repo can change the code and submit it back to the repo.

The repository owner has built-in versioning available for tracking changes.

If you are unfamiliar with git there is a free book available online in several different eBook formats.

https://git-scm.com/book/en/v2

I don't know what your OS is, and honestly don't care. :) If Windows there is a very nice (IMO) git client available. console mode Bash interface and a GUI.

https://git-scm.com/

I use Github for my repositories.

https://github.com/GeorgePimpleton
Topic archived. No new replies allowed.