Some problem with the STL::SET

Hi, everyone,

I am really a beginner in C++. I am now doing a small project which is storing a set of discrete & continuous variables in a STL::SET. The original datafile is like this:
Variable Name:
total_posts total_length_of_posts total_topics_read times_logged_in Age Sex Status OnlineClass Class_w_Instructor Choice1 Choice2 Choice3 Choice4 OnlineDiscussion interest_avg value_avg choice_avg related_avg competence_avg LikeCourse LikeDiscussion LikeInstructor
Value:
row_1 high 10 15065 53 309 3 0 2 7 0 7 6 7 4 2 6.75 7 7 4.71 6.67 7 7 7
row_2 high 5 6399 23 109 1 1 2 6 0 2 7 7 2 1 4.5 5 4.5 4.14 5.17 6 5 7
row_3 low 3 652 15 46 1 1 1 1 2 6 6 6 6 4 5 5.57 3.75 3.57 5 6 5 6
.
.
.
row_22 ........................................

I want to know how to store the data into a set container and how to use set::size method to check how many unique values were in one column.


Thanks so much...
Last edited on
This is an interesting problem, but before I make any major suggestions, I'd like to ask a few questions:

1. Do you want to keep each record (each line) together, or do you want to keep each column together as its own record?

2. Does it matter whether the numbers are stored as strings or numbers?

3. If this is homework, have you learned about templated functions and functors?
Thanks for your reply.
1. Since I try to use SMILE (Structural Modeling, Inference, and Learning Engine) to do the work. In SMILE, I'll iterate over the values with DSL_dataset::At, and store the data row by row. But when the iteration is over, I want to compare the value using set::size() in one column (belong to the same variable).
2. Since I will do the discretization work firstly, I only care about the integer numbers and do the comparision work.
3. This is not a homework. I tried to find some example about how to store a dataset using SET container and use set::size to return the number of unique values in the column, but I cannot find an good example. BTW, I am learning the knowledge by myself what u said.

Thanks again.
OK, I'll have to read-up a little on SMILE before I jump into any ideas... Give me a day or two...
Well, I know nothing about SMILE or programming on a Mac, but I can babble semi-intelligently about the STL... :-,

A set is an unordered list of elements, so it does not look like the right structure to use to store your data. I would opt for a vector. Since everything is a number except for the first column (I don't know what 'high' and 'low' means...) you might as well use doubles.
1
2
3
4
5
6
// If that 'high' and 'low' stuff means anything...
const double high_value = 10.0;
const double low_value = 0.0;

typedef vector <double> post_t;
typedef vector <post_t> all_posts_t;

You could make a more advanced class if you like (which inherits from vector or some SMILE type, so long as it provides iterators to the elements).

To compare unique values in a column, you need to be able to collect them out of that column, and then you need to condense it. The STL <algorithm>s provides a number of useful functions just for this kind of thing.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <algorithm>
#include <iterator>
#include <set>
#include <vector>
...
using namespace std;

// This is my transformation functor.
// Given a column number (at construction) and a post_t (when used),
// it returns the value in the given column.
//
struct f_extract_column
  {
  int column_number;
  f_extract_column( int column_number ): column_number( column_number ) { }
  double operator () ( const post_t& post )
    {
    return post.at( column_number );
    }
  };

// This function counts the number of unique values in a column by
// first copying the column values out into a std::set and then returning
// the size of the set.
//
int unique_values_in_column( all_posts_t posts, int column_number )
  {
  set <double> column;
  transform(
    posts.begin(),
    posts.end(),
    insert_iterator <set <double> > ( column, column.begin() ),
    f_extract_column( column_number )
    );
  return column.size();
  }


Hmm... I'm sure there is a more elegant way to do this, but just off the top of my head that's it for now...

Sorry to respond so late.
Topic archived. No new replies allowed.