Header Ads

Garbage Collector Simplified



Garbage Collector in Java:-

In my last post in Heap Concept I have told how JVM allocates memory for program. Now say JVM loads a program into main memory & program ended with success. For us it’s over. But for Java there are tasks to do, whenever a program is over, we need to remove it’s source code for next program to execute. All the structures & unused referenced need to be deleted from memory for your next code to run successfully. These unused or used objects ,references are called garbage and they need to be removed. The removal process is called Garbage collection .

Disadvantages of Manual Garbage collection....
In some of the programming language it is a programmer’s responsibility to free up the memory. The complexity of that task leads to many common errors that can cause unexpected or erroneous program behavior and crashes. As a result, a large proportion of developer time is often spent debugging and trying to correct such errors.
while developer tries to free up memory space there might be situation occrures when they  deallocate the space used by an object to which some other object still has a reference. If the object with that (dangling) reference tries to access the original object, but the space has been reallocated to a new object. Result is unpredictable situation. It is called dangling references

There might be another situation when developer might have made a mistake to do a proper coding which might call the space leaks. These leaks occur when memory is allocated and no longer referenced but is not released. If enough leaks occur, they can keep consuming memory until all available memory is exhausted.

Automatic Memory Management :
So there was a improvement area over a manual garbage collection and an automatic memory collection. Java being an advanced language , has came up with the automatic memory management.

What primary goal for automatic memory management--

Automatic Garbage collection avoids the dangling reference problem, because an object that is still referenced somewhere will never be garbage collected and so will not be considered free. Garbage collection also solves the space leak problem described above since it automatically frees all memory no longer referenced.

This task is done by garbage collector in Java.Java implemented this using a demon thread. This thread when called runs invoke  method to do this activity.However we don’t have any control over this. Only JVM,if realizes ,it  requites more space in memory to run current program, activates this method to clean up memory.

We can only send request JVM to clean up memory by

• System.gc()
• Runtime.gc()

But it purely depends on Heap Size & requirement of space so that JVM activates the process. But there is no free () or delete () function in Java like c++; JVM performs garbage collection in a low priority thread.

How Garbage collector works:-

Java’s garbage collector checks for any active reference of any object/thread/static reference. If it found the specific object is not reachable from any live threads or any static reference it frees up the memory
Causes for Garbage Collection:-

  • All object references that are assigned to null.
  • Object created inside a block & control goes out of the block.
  • Say a object holds references of some other object & parent object is set to null.Then both the objects becomes dead & eligible for garbage collection.
  • If a object has live reference from hashmap.
  • All cyclic references (an object “A” points to another object “B” & object “B” again refer to object “A”) if dead is also eligible for garbage collection.
Garbage Collection Method:-
The popular algorithms for garbage collection are:

  • Throughput Garbage Collector-
  • Incremental low pause collector-


  • Referencing counting collector-This is  bit old strategy. In this process when an object is instantiated the reference counter becomes one(1) and more the object is used/assigned reference by some other root or object JVM keeps incrementing its value.But when the object is set a new value or object becomes out of scope referencing counter is decremented by 1. By this method when referencing pointer becomes zero. This object becomes garbage collectable. This method is best when code size is small.but takes a long time when code size is very big and fails if there is any circular reference of object happens
  • Tracing collector: This is a two phase graph based algorithm. First-Mark and second is Sweep.By this algorithm every time JVM gets the root and try to navigate the branches of the objects. If it finds a suitable reference from root to any other object, the object is considered as alive and Garbage collector marks it , otherwise the object is not marked , treated as dead and in the sweep phase the object is removed.  This algorithm is good but can not handle the fragmentation issue.
  • Compacting Collectors:It is created on top of Tracing collector to minimize the problem of Heap fragmentation. In this process Mark is activated and post to that Sweep is activated. In the third phase Collector basically fragment the space i.e. move all the live objects aside and rectify the heap pointer.For this multiple assignment this algorithm is again good when there are less number of objects in the Heap memory but takes a large amount of time when objects are more.So in a word it is having performance issue.

  1. Takes a very long time
  2. Time spent is dependent on the size of the heap
  3. That is why the “Permanent Space” was invented
  4. And cause un/reloading of classes wasn’t very common either
  • Copying Collector: In this approach the entire heap is divided into two halves...Old and new heap zone.Initially when a program starts ,JVM starts to assign area from new zone. after sometime when Garbage collector gets activated ,it stops the programme running,  the live objects are placed to the other half . The new area is marked as New and the used area is marked as old and garbage collected. Post this operation the normal programme execution resumes. This approach solves all the memory related previous problems but introduce new kind of problem.How?? Every time JVM starts to execute the allocated memory for Heap is divided into two halves. So effective usage area is reduced.   One survivor space is always empty.Serves as destination for minor collections .Objects get copied to the tenured space when the 2nd survivor space fills up.
    Major collections occur when the tenured space fills up.Major collections free up Eden and both survivor space
  • Stop active programming
  • Incremental in nature
  • Time GC with new object creation
  • If GC runs, suspend new allocation Concurrent/Parallel
  • Allocation happens at the same time as GC
  • Very complex locking regimes
  • Generations/Spaces make it easier
  • Concurrent low pause or short pause collector- By this technique garbage collection and program run parallel. This underneath uses the graph based technique and rectify the heap size of copying collector. So reclaiming and fragmentation of heap memory happens at a time.Since most of the work of tracing the object graph is waiting for the computer's memory system, the high throughput collector spins up multiple threads to try to hide that latency. The interesting part of the parallel collector is the way it uses multiple threads to copy objects, but still ends up with one large, unfragmanted block of free space when it is done.
This section is taken from Wiki: 
Moving vs. non-movingOnce the unreachable set has been determined, the garbage collector may simply release the unreachable objects and leave everything else as it is, or it may copy some or all of the reachable objects into a new area of memory, updating all references to those objects as needed. These are called "non-moving" and "moving" (or, alternatively, "non-compacting" and "compacting") garbage collectors, respectively.
At first, a moving GC strategy may seem inefficient and costly compared to the non-moving approach, since much more work would appear to be required on each cycle. In fact, however, the moving GC strategy leads to several performance advantages, both during the garbage collection cycle itself and during actual program execution:
  • No additional work is required to reclaim the space freed by dead objects; the entire region of memory from which reachable objects were moved can be considered free space. In contrast, a non-moving GC must visit each unreachable object and somehow record that the memory it alone occupied is available.
  • Similarly, new objects can be allocated very quickly. Since large contiguous regions of memory are usually made available by the moving GC strategy, new objects can be allocated by simply incrementing a 'free memory' pointer. A non-moving strategy may, after some time, lead to a heavily fragmented heap, requiring expensive consultation of "free lists" of small available blocks of memory in order to allocate new objects.
  • If an appropriate traversal order is used (such as cdr-first for list conses), objects that refer to each other frequently can be moved very close to each other in memory, increasing the likelihood that they will be located in the same cache line or virtual memory page. This can significantly speed up access to these objects through these references.
One disadvantage of a moving garbage collector is that it only allows access through references that are managed by the garbage collected environment, and does not allow pointer arithmetic. This is because any native pointers to objects will be invalidated when the garbage collector moves the object (they become dangling pointers). For interoperability with native code, the garbage collector must copy the object contents to a location outside of the garbage collected region of memory. An alternative approach is to pin the object in memory, preventing the garbage collector from moving it and allowing the memory to be directly shared with native pointers (and possibly allowing pointer arithmetic).
Copying vs. mark-and-sweep vs. mark-and-don't-sweepTo further refine the distinction, tracing collectors can also be divided by considering how the three sets of objects (white, grey, and black) are maintained during a collection cycle.
The most straightforward approach is the semi-space collector, which dates to 1969. In this moving GC scheme, memory is partitioned into a "from space" and "to space". Initially, objects are allocated into "to space" until they become full and a collection is triggered. At the start of a collection, the "to space" becomes the "from space", and vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects are scanned in turn, and all objects that they point to are copied into "to space", until all reachable objects have been copied into "to space". Once the program continues execution, new objects are once again allocated in the "to space" until it is once again full and the process is repeated. This approach has the advantage of conceptual simplicity (the three object color sets are implicitly constructed during the copying process), but the disadvantage that a (possibly) very large contiguous region of free memory is necessarily required on every collection cycle. This technique is also known as stop-and-copyCheney's algorithm is an improvement on the semi-space collector.
mark and sweep garbage collector maintains a bit (or two) with each object to record whether it is white or black; the grey set is either maintained as a separate list (such as the process stack) or using another bit. As the reference tree is traversed during a collection cycle (the "mark" phase), these bits are manipulated by the collector to reflect the current state. A final "sweep" of the memory areas then frees white objects. The mark and sweep strategy has the advantage that, once the unreachable set is determined, either a moving or non-moving collection strategy can be pursued; this choice of strategy can even be made at runtime, as available memory permits. It has the disadvantage of "bloating" objects by a small amount.
mark and don't sweep garbage collector, like the mark-and-sweep, maintains a bit with each object to record whether it is white or black; the gray set is either maintained as a separate list (such as the process stack) or using another bit. There are two key differences here. First, black and white mean different things than they do in the mark and sweep collector. In a "mark and don't sweep" system, all reachable objects are always black. An object is marked black at the time it is allocated, and it will stay black even if it becomes unreachable. A white object is unused memory and may be allocated. Second, the interpretation of the black/white bit can change. Initially, the black/white bit may have the sense of (0=white, 1=black). If an allocation operation ever fails to find any available (white) memory, that means all objects are marked used (black). The sense of the black/white bit is then inverted (for example, 0=black, 1=white). Everything becomes white. This momentarily breaks the invariant that reachable objects are black, but a full marking phase follows immediately, to mark them black again. Once this is done, all unreachable memory is white. No "sweep" phase is necessary.
 
Generational GC (ephemeral GC)It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or thegenerational hypothesis). A generational GC (also known as ephemeral GC) divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when references cross generations by observing the creation and overwriting of references. When the garbage collector runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection cycles while still reclaiming most unreachable objects.
In order to implement this concept, many generational garbage collectors use separate memory regions for different ages of objects. When a region becomes full, those few objects that are referenced from older memory regions are promoted (copied) up to the next highest region, and the entire region can then be overwritten with fresh objects. This technique permits very fast incremental garbage collection, since the garbage collection of only one region at a time is all that is typically required.
Generational garbage collection is a heuristic approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space. In fact, runtime systems for modern programming languages (such as Java and the .NET Framework) usually use some hybrid of the various strategies that have been described thus far; for example, most collection cycles might look only at a few generations, while occasionally a mark-and-sweep is performed, and even more rarely a full copying is performed to combat fragmentation. The terms "minor cycle" and "major cycle" are sometimes used to describe these different levels of collector aggression.
Stop-the-world vs. incremental vs. concurrentSimple stop-the-world garbage collectors completely halt execution of the program to run a collection cycle, thus guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector is running.
This has the obvious disadvantage that the program can perform no useful work while a collection cycle is running (sometimes called the "embarrassing pause"). Stop-the-world garbage collection is therefore mainly suitable for non-interactive programs. Its advantage is that it is both simpler to implement and faster than incremental garbage collection.
Incremental and concurrent garbage collectors are designed to reduce this disruption by interleaving their work with activity from the main program. Incremental garbage collectors perform the garbage collection cycle in discrete phases, with program execution permitted between each phase (and sometimes during some phases). Concurrent garbage collectors do not stop program execution at all, except perhaps briefly when the program's execution stack is scanned. However, the sum of the incremental phases takes longer to complete than one batch garbage collection pass, so these garbage collectors may yield lower total throughput.
Careful design is necessary with these techniques to ensure that the main program does not interfere with the garbage collector and vice versa; for example, when the program needs to allocate a new object, the runtime system may either need to suspend it until the collection cycle is complete, or somehow notify the garbage collector that there exists a new, reachable object.


Precise vs. conservative and internal pointersSome collectors can correctly identify all pointers (references) in an object; these are called precise (also exact or accurate) collectors, the opposite being a conservative or partly conservative collector. Conservative collectors assume that any bit pattern in memory could be a pointer if, interpreted as a pointer, it would point into an allocated object. Conservative collectors may produce false positives, where unused memory is not released because of improper pointer identification. This is not always a problem in practice unless the program handles a lot of data that could easily be misidentified as a pointer. False positives are generally less problematic on 64-bit systems than on 32-bit systems because the range of valid memory addresses tends to be a tiny fraction of the range of 64-bit values. Thus, an arbitrary 64-bit pattern is unlikely to mimic a valid pointer. Whether a precise collector is practical usually depends on the type safety properties of the programming language in question. An example for which a conservative garbage collector would be needed is the C language, which allows typed (non-void) pointers to be type cast into untyped (void) pointers, and vice versa.
A related issue concerns internal pointers, or pointers to fields within an object. If the semantics of a language allow internal pointers, then there may be many different addresses that can refer to parts of the same object, which complicates determining whether an object is garbage or not. An example for this is the C++ language, in which multiple inheritance can cause pointers to base objects to have different addresses. Even in languages like Java, internal pointers can exist during the computation of, say, an array element address. In a tightly-optimized program, the corresponding pointer to the object itself may have been overwritten in its register, so such internal pointers need to be scanned.

Garbage collector can lead to two different conditions if not executed correctly...

  • premature deallocation(Corrupted Pointer)...It happens for C and C++
  • Memory Leak or Incomplete deallocation....It happens for Java

Why memory leak happens?
A memory leak is where an unreferenced object that will never be used again still hangs around in memory and doesnt get garbage collected.

  • Poor Design of Software
  • Unnecessary object reference
  • Long live object reference which is no longer needed
  • Faulty algorithm design
Abnormal memory allocation in non-heap stack area may resultant of
I. Leaked loader
II. Massive unused string


Garbage collector in java is not like destructor in c++; but it has finalize () method-
Public class A{

Public void finalize () {
System.out.println (“Garbage”);
}
}
The JVM has full control on JVM for not to collect memory which is calling garbage collection’s finalize method. Such objects for which the finalized () method is used is called “Phantoms”

Inside memory Java collaborate Java’s stack & heap objects via reference. A reference is what is returned from the new statement. It allows an object refer another. An object can have many references pointing to it. When every reference is dropped, garbage collects the object.
Let us go deep into reference type-

I. Strong Reference

II. Soft Reference
III. Weak Reference
IV. Phantom Reference
Any object which can be accessed from our code is called Reachable. It indicates there is a direct or indirect reference to the object.
A strong reference signifies it’s alive & the space can’t be collected .
A soft reference signifies that the object which is referred can be collected but we didn’t opt for it.
A weak reference signifies that the object which is referred can be collected & reference is kept for further use.
A Phantom reference signifies the object which is referred can’t be collected & finalized also the memory space should not be reduced until we release it.
By default we work with a strong reference.Best programming practice-
Even if by default we use strong object reference but Java best practice would be to use either a soft or weak reference. By this you can prevent the memory errors.

A heap in real time become the collection of free heap area+ occupied live area+occupied dead area.
pic--

Now in such condition the memory pointer points to the last used memory allocation even if there are free and dead object present in the Heap.In this situation JVM throws “OutofMemory” Exception. To prevent this there is a way to increase the Heap size. Will it solve your problem??
Well...this might save the exception to be occured but heavy Heap size will surely slow down the process. Oracle(Sun) java has come up with another solution called fragmentation. Very similar to window’s disk fragmentation.Thus the second objective of garbage collector to do fragmentation of the heap. It occurs during the allocation process of a new object. By this process java not only frees up the memory but also correctly measures the free spaces available in Heap.

Advantages of Garbage collection:
  • Programmers do not need to chase java for memory and need to take care of memory allocation so more concentration on business logic building. Hence more productive usage of time.
  • As programmers do not need to chase for memory or memory clean up , the abnormal memory free or incorrect memory clean up does not occurs.
Coutsey: http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf


Powered by Blogger.