Go Pointers

Go Pointers

Immutability Vs Efficiency

Pointers are a fundamental and necessary aspect of Golang. They allow us to manipulate memory and data structures from a simple level, without needing to know the specifics of some more abstract data structures.

This article will cover Go pointers, their use in stacks and heaps, immutability vs efficiency as well as pointer types (primitives and variables).

Everybody uses pointers occasionally. How well-versed are we in it? What is taking place behind the scenes? In this post, we'll talk about pointers and how they might improve program performance at the expense of flexibility. Anyone who reads this will be able to explain how pointers relate to any of this, as well as what happens when a function calls and the distinction between heap and stack allocations.

Let's define a variable in Golang, you can relate to any programming in general. Let's understand what is a variable first. So a variable is a container for storing a value. we can think of it as a box that has 3 things

  • a name
  • a type
  • a value

variable_explain.gif

It will store somewhere in the memory.

somewhere_in_the_memory.gif

It resembles putting a box in a warehouse in many ways. The value is in the box. We gave that box a name and a type as well. We added an address to that box as well. The location of the box inside the warehouse will be indicated by this address. Therefore, if we know the address, we can quickly locate and retrieve the box if we need it.

Let's see this in Golang

func main() {
    var foo, bar int = 23, 42

    fmt.Println(foo, bar) // will print the value
    fmt.Println(&foo, &bar) // will print the address
}

Easy, right? In the first line we are defining 2 variables foo, bar with the value of 23, and 42 of type int. in lines 3 & 4 we print the values and it's addressed in the console.

Quick Note: & can be read as address of. Every new variable has been given an address, and with that, we can locate that in the memory. This address will be the value of the pointer if we assign it to a pointer like below

func main() {
    var foo, bar int = 23, 42
    p := &foo
    q := &bar

    fmt.Println(p, q) // will print the address of foo, bar
    // 0xc00001c0a8 0xc00001c0b0
}

p & q will hold the address of the foo & bar variable. Here we're using the short-hand variable declaration feature of Golang.

foo2p-address.gif

In the above picture, we can see how p is holding the address of foo.

func main() {
    var foo int = 23
    p := &foo

    fmt.Println(*p) 
    // any guess
}

*p will print the value stored in that address which is the foo value 23 we define above.

Here * can be a little confusing at first as it can be used in two ways.

  • Before a type (*int)
  • Before a variable (*p)

    Before a type

    *int the whole thing becomes a type. It is pointer type and the int as its base.

    Before a variable

    *p means when it is before a variable, it acts as an operator which returns the value the p is pointing to. That's why when we print *p it'll print 23 because it's the value of the variable p is pointing to. It's also called Dereferencing. So we can say that the value of p is the address of foo and *p is the value at that address which is the value of foo. So what if we want to change the value of *p what will happen then. Any guess...
func main() {
    var foo int = 23
    p := &foo
    fmt.Println(*p) // 23
    *p = 42
    fmt.Println(*p)
    // any guess
}

Yes, it'll be 42.

func main() {
    var foo, bar int = 23, 3600
    p := &foo
    fmt.Println(*p) // 23
    p = &bar
    *p = *p / 36
    fmt.Println(bar)
    // any guess
}

Any guess what will happen to the value of bar variable?

Quick note: We can put bar in *p because p's type is a pointer and the base type is int, if it's not int then it'll return a run time error.

As we're doing an operation on *p so the value of the bar will be modified. So the value will be printed 100. So why do we need pointers anyway? Good question, right? If we just want to modify bars value then we can just modify bar right? Then why?? Well, It's efficient to store a value in one place and access it from multiple places. Let's understand with an example Suppose we have four different functions and all the functions want to access bar and want to modify it. So bar will be modified in multiple places. This way of accessing a variable from multiple places using pointers is more efficient than creating a local copy of the variable without using a pointer. To understand the situation more clearly, we need to understand Memory Allocations first. Let's understand that first...

Memory Allocations

When we try to run a program, a goroutine is created and each goroutine gets a stack of memory.

stack_of_memory.gif

You may ask what is a goroutine...

What is a goroutine?

a goroutine is an independent path of execution. we can also think of it as a very lightweight thread that is managed by go runtime.

Let's go back to the topic. Whenever a goroutine makes a function call, a part of the stack is going to be allocated we call that frame. Let's see this in an example for a better understand

func main() {
    a := 6
    AddN(a)
}
// AddN will add n to the result and print its address and value 
func AddN(n int) {
    r := 0
    r += n
    fmt.Println(&r, r)
}

Here we define two functions main & AddN, when we run the main function we get a frame on the stack. The current running frame is called the Active Frame.

active_frame_stack.gif

So, After running the main function, we then call "AddN" as we follow through the main function. The stack will allocate another frame as soon as we call "AddN," and the goroutine will only operate within that new frame. It cannot go to other frames, stacks, or anything else. This is advantageous because, if we isolate each frame, we guarantee immutability, which implies that there is less chance that the variables will be changed during the program. So here the common question arises how can we access the a variable inside the active frame? So the straight forward answer would be we can't access it. instead, we have to copy the value of a into the new active frame and inside the active frame that value is going to be called n and we can modify n add it then print it, and do whatever we want with it but because we're making the changes inside the active frame it will not change anything else in the program outside of this frame. So the mutation will only happen inside this isolated frame.

Can anyone guess what's the catch here? Because we need to copy the arguments each time we make a function call which is not going to be so efficient. So when the AddN function call ends and the active frame goes back to the main function a will still be 6 but what if we want to change a itself in the main function we want to get our hands on a and not just the copy of it well this is where we start talking about pointers.

Let's write a new function using pointers so that it can modify the variable a in the main function from the function by saying go and changing the value at that specific address.

func main() {
    a := 6
    squareAdd(&a)
}
// AddN will add n to the result and print its address and value 
func squareAdd(p *int) {
    *p *= *p
    fmt.Println(p, *p)
}

We're inputting an address instead of a normal integer so we're going to put this address as the input parameter and we're calling it p so the type of the input is *int(star int). The star here is not a dereferencing operator as we discuss above. Star int itself is just one whole token. We want to square the value of what's at that address so we need to put a star in front of p if we want to say the value at p which in this example is going to be a and then let's print out p which is an address and the value of what p is pointing to by saying star p(*p).

So when we call this function we need to pass in an address, not a value what you need to pass is an ampersand(&) a because & means that you're passing in the address of a.

Let's see what happens in the stack when we call squareAdd function. Instead of copying a we are copying the address of a and assigning it as a pointer p in the frame and that pointer is pointing across the boundary of the frame and this is how we can modify the value of a in the currently active frame by using *p.

After we finish calling the functions we move to the main function again and everything under the active frame then becomes invalid meaning that if we make another function call this space will be overwritten and go will set all the variables to a zero value for the new frame so that we won't accidentally be using any random garbage values.

We'll explain Garbage collector in detail in a future post. Stay tuned for that. Now let's continue...

When we're using value semantics like the example above with AddN it was fine there's no way a can get mutated but when we using pointer semantics we need to be careful because there is more possibility for the variable to be mutated in a way we didn't intend.

When we use pointer semantics you're giving up the safety of immutability for more efficiency. Now that we understand how pointers work in functions and we also learned about how they can affect variables in the stack. Now final topic we need to understand is Heaps. Let's talk about it.

Heaps

To understand the Heaps which is not the data structure we know from CS 101, but it's a separate structure altogether. We need to understand that heaps need to be cleaned by the garbage collection where the stack is self-cleaning. To understand the difference between heap and stack, we need to compare the difference between returning a value and returning a pointer. Let's define an example to understand more clearly...

Return value

package main

type person struct {S
    name string
    age uint
}

func NewPerson() person {
    p := person{
        name: "dummy person"
        age: 60
    }
    fmt.Println("new person --> ", p)
    return p
}

func main() {
    fmt.Println("main --> ", NewPerson())
}

Return Pointer

package main

type person struct {S
    name string
    age uint
}

func NewPerson() *person {
    p := person{
        name: "dummy person"
        age: 60
    }
    fmt.Println("new person --> ", &p)
    return &p
}

func main() {
    fmt.Println("main --> ", NewPerson())
}

So the two code block above has almost identical code with one exception where 1st code block returns a value fromNewPerson function and 2nd code returns a pointer from NewPerson function.

The NewPerson initializes the person struct with dummy values and then returns it. After that, we call the NewPerson function from the main function and print the result.

What happening behind the scene is, Go runtime assign the main function as an active frame in the stack of memory. Then when we call the NewPerson, a new frame is created in the stack of memory and allocates p, and then changes the values in p. Because of the isolation of the NewPerson frame, we can not send p to the main function instead we will be making a copy of it and pass to the main active frame so that's what happens when we return a value.

But instead of returning a value, let's return the address of p which we showed in the above example. The point to be noted here is the function still works the same way as before but instead of copying the value this function going to make a copy of the address of p to the main function frame, we can notice something important here at the same time something weird as the NewPerson finishes executing here the New Person active frame is going to become invalid so the address we copied into the active frame is going to be useless we don't know what that going to point to in the memory. So that can be a huge problem if we can't resolve the address and this is where heaps going to help us solve the problem.

Note Heaps is not the same as the data structure we study in cs 101 data structures, they share the same name but completely different things.

So the compiler will analyze that and conclude that there's going to be a problem so it's going to copy m to the heap then the NewPerson function will return the address of p in the heap and after return when the address of p is copied to the frame of the main function. So now we can access p with that address.

In the above, we print the address of the p to check if they share the same address from NewPerson function and the main function. So our problem is solved but we're doing this in the cost of heap allocations which can be a burden for the garbage collector and it can cost us performance.

If you like, you can read the same article on our official blog

You can read our other official blog-posts Here

You can read my other blog-posts Here

Conclusion

Go pointers can be a great way to implement efficiency in the codebase. But in doing so we have to think about the garbage collector as we assign more work for it as it needs to clean the heap allocations instead if we want Immutability in the codebase so that it uses a stack of memory which will automatically clean the stack when the frame finished its work and it just discards the frame and everything inside that frame when another function is called this space will be used by another frame. So we need to understand the stack and heaps because if we put too many things on the heap then the Garbage Collector needs to free more things from the heap as we don't use those things anymore. In contrast, it can affect performance.