anyhow from what I have read they say that to poll or get the min element ( or max depending on what type of heap ) it should take O(1) - constant time.
but this doesn't seem to be the case at least not for my heap, so yes it does take O(1) to lets say get a number from a vector example -int a = vec.at(0);
but this just isn't the case with a heap, first we have to get the number which is indeed O(1) but then we have to call heapifyDown() which again brings the new top of the heap down the heap until it finds a suitable placement. and this will require recursion or iteration possibly multiple times depending on how big the heap is.
so why is it said that to remove from a heap is O(1)?
*note I think they say insertion is also O(1), but we have to call heapifyUp() which again entails iteration.
"Getting" an extremum of a collection doesn't imply that you'll remove it from the collection. It's perfectly reasonable to just want to know the value of the extremum without wanting to remove it. That operation takes constant time for a heap.
Removing an element is a separate operation, and that does take longer.
"Getting" an extremum of a collection doesn't imply that you'll remove it from the collection. It's perfectly reasonable to just want to know the value of the extremum without wanting to remove it. That operation takes constant time for a heap.
Removing an element is a separate operation, and that does take longer.
very true! it would actually make sense to have a function that just gets the top without removing it.
in the case of insertion, again this isn't O(1) right? as we have to call heapifyUp#
The structure behind the heap matters. The Wikipedia article is making reference to your traditional node-pointer binary heap.
Also, “insertion” does not include finding the insertion spot.
To insert a node in a binary tree this can be done by simply constructing a new node with the correct children.
1 2 3 4 5 6
struct node
{
type value;
node* left;
node* right;
};
The constructor can easily insert a node into the tree in O(1) time:
C
1 2 3 4 5 6 7 8
node* create_node( type value, node* left, node* right )
{
node* result = (node*)malloc( sizeof( node ) );
result->value = value;
result->left = left;
result->right = right;
return result;
}
C++
1 2 3
node::node( type value, node* left, node* right )
: value{value}, left{left}, right{right}
{ }
So the O(1) update operation for (a (b c) d) → (a (b c) (d e)) would simply be:
C
1 2 3
a_node->right = create_node( empty_value, // replace the "d" leaf with a branching node
a_node->right, // the old "d" leaf
create_node( e_value, NULL, NULL ) ); // the new "e" leaf
C++
1 2 3
a_node->right = new node( empty_value,
a_node->right,
new node( e_value, nullptr, nullptr ) );
Again, finding the parent node is not O(1). (It is O(log n) for a balanced binary tree, or O(n) for a degenerate linked list.)
Splitting hairs, it is.
[edit]
You, of course, are using the standard “heap” construct, in which inserts also require a push-down, so it cannot be O(1).
The wikipedia article says insert's average case is O(1), but is that right?
The number of operations required [for insert] depends only on the number of levels the new element must rise to satisfy the heap property, thus the insertion operation has a worst-case time complexity of O(log n) but an average-case complexity of O(1).
It seems to me the average case would be half the height of the tree, so still O(log n).
Hmm... In the average case the tree would be balanced or almost balanced, and therefore about half of the elements in the heap would only need to rise one level, a quarter would need to rise two levels, an eighth would need to rise 3 levels, etc. 1/2 + 2/4 + 3/8 + 4/16 + ... = 2, therefore the average time is bounded, and thus constant.
That makes sense. Is that a kind of "amortized time"?
I was just thinking of adding one element, where it would rise on average half the current height. But we are supposed to be analyzing the rate of increase as more and more elements are added. Obviously the levels increase more and more slowly, which is what bounds the average rise.
I don't think so. The average case tells you what happens when you're dealing with uniformly distributed random data. It's useful because it gives you a measure of how well the algorithm will perform without knowing anything about the input. The underlying assumption is that most applications will most of the time give inputs that are close to the average of the entire input space, but this assumption may be invalid in specific applications.
On the other hand, an amortized analysis should give you an answer that's independent of the input. It gives you a measure of, realistically, how you can expect the algorithm to perform even if your input is always the worst case.
This program suggests that the actual average is around 1.25 to 1.3.
The methodology is incorrect. In the loop on line 35 you need to reset the counter, insert, then save or print counter/size. Theoretically, the curve should flatten out.
EDIT 2: Sorry, that's still wrong.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Heap rnd_heap (int size, int hi = 999999, int lo = 0){
std::default_random_engine rnd {std::random_device{}()};
std::uniform_int_distribution<> dist(lo, hi);
constint loops = 1000;
std::vector<double> stats(size);
for (int i = loops; i--;){
Heap h;
for (int n = size; n--; ){
auto index = h.size();
h.reset_counter();
h.insert (dist(rnd));
stats[index] += h.get_counter() / loops;
}
}
for (auto x : stats)
std::cout << x << std::endl;
return h;
}