Aug 12, 2022

Pass by Value vs Pass by Pointer

When I was in college, our computer science instructor introduced us to the struct and then gave us the following rule: when you pass a struct into or out of a function, you should use a pointer. The reason being that a struct is full of other values and you don’t want to pass all of that on the stack.

I followed that rule for a long time, and I see a lot of other programmers (particularly C programmers) that still follow it. Whether out of habit or preference I don’t know.

At some point I decided that all of that struct pointer stuff made for some ugly, hard-to-read code, especially with something like vector math, and so I switched to passing nearly all structs by value unless the function needed to modify the struct itself.

But I wondered: did it really matter from a performance perspective? What about large structs? And are there any non-performance benefits?

Small Struct

A good example of a reasonably small struct is something like a vec4, which is four floats totaling 16 bytes.

1
2
3
4
5
6
typedef struct {
	float x;
	float y;
	float z;
	float w;
} vec4;

The old school C-style way of doing an add operation with two of them would be the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
void
vec4_add(vec4* out, vec4* a, vec4* b)
{
	out->x = a->x + b->x;
	out->y = a->y + b->y;
	out->z = a->z + b->z;
	out->w = a->w + b->w;
}


vec4 a = {1.0f, 3.0f, 5.0f, 7.0f};
vec4 b = {2.0f, 4.0f, 6.0f, 8.0f};

vec4 c;
vec4_add(&c, &a, &b);

The reason I think this makes for ugly code is that you need to declare your output on a separate line from the function and its inputs, and I don’t like passing the addresses of things to functions when not necessary.

My preferred way looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
vec4
vec4_add(vec4 a, vec4 b)
{
	vec4 result;

	result.x = a.x + b.x;
	result.y = a.y + b.y;
	result.z = a.z + b.z;
	result.w = a.w + b.w;

	return result;
}


vec4 a = {1.0f, 3.0f, 5.0f, 7.0f};
vec4 b = {2.0f, 4.0f, 6.0f, 8.0f};

vec4 c = vec4_add(a, b);

In this version the entire thing reads like the math operation it’s performing: c = a + b

Great, so it looks better (in my opinion), but what about performance? Let’s consult the almighty Godbolt.

The compiler that I’ll be using in the tests is x86-64 clang 14.0.0, and I’ve removed the boilerplate instructions that occur at the start (setting up the stack frame) and end (returning to the caller) of every function, as they are the same for all examples.

Pass by Pointer (-O0)

First, the pointer version without optimizations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Put the three pointers onto the stack
mov     qword ptr [rbp - 8], rdi   # out
mov     qword ptr [rbp - 16], rsi  # a
mov     qword ptr [rbp - 24], rdx  # b

# out->x = a->x + b->x;
mov     rax, qword ptr [rbp - 16]   # get address of a
movss   xmm0, dword ptr [rax]       # place a->x in xmm0
mov     rax, qword ptr [rbp - 24]   # get address of b
addss   xmm0, dword ptr [rax]       # add b->x to a->x in xmm0
mov     rax, qword ptr [rbp - 8]    # get address of out
movss   dword ptr [rax], xmm0       # put sum in out->x

# out->y = a->y + b->y;
mov     rax, qword ptr [rbp - 16]   # get address of a
movss   xmm0, dword ptr [rax + 4]   # place a->y in xmm0
mov     rax, qword ptr [rbp - 24]   # get address of b
addss   xmm0, dword ptr [rax + 4]   # add b->y to a->y in xmm0
mov     rax, qword ptr [rbp - 8]    # get address of out
movss   dword ptr [rax + 4], xmm0   # put sum in out->y

# out->y = a->y + b->y;
mov     rax, qword ptr [rbp - 16]   # get address of a
movss   xmm0, dword ptr [rax + 8]   # place a->z in xmm0
mov     rax, qword ptr [rbp - 24]   # get address of b
addss   xmm0, dword ptr [rax + 8]   # add b->z to a->z in xmm0
mov     rax, qword ptr [rbp - 8]    # get address of out
movss   dword ptr [rax + 8], xmm0   # put sum in out->z

# out->w = a->w + b->w;
mov     rax, qword ptr [rbp - 16]   # get address of a
movss   xmm0, dword ptr [rax + 12]  # place a->w in xmm0
mov     rax, qword ptr [rbp - 24]   # get address of b
addss   xmm0, dword ptr [rax + 12]  # add b->w to a->w in xmm0
mov     rax, qword ptr [rbp - 8]    # get address of out
movss   dword ptr [rax + 12], xmm0  # put sum in out->w

Total Instructions: 27

I’ve annotated what’s happening to make it more clear. The key thing to notice is that each add requires six instructions, and three of them are fetches from memory.

Pass by Value (-O0)

Let’s compare that to the alternative:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Put a and b onto stack
movlpd  qword ptr [rbp - 32], xmm0  # a.x, a.y
movlpd  qword ptr [rbp - 24], xmm1  # a.z, a.w
movlpd  qword ptr [rbp - 48], xmm2  # b.x, b.y
movlpd  qword ptr [rbp - 40], xmm3  # b.z, b.w

# result.x = a.x + b.x;
movss   xmm0, dword ptr [rbp - 32]  # put a.x in xmm0
movss   xmm1, dword ptr [rbp - 48]  # put b.x in xmm1
addss   xmm0, xmm1                  # add b.x to a.x in xmm0
movss   dword ptr [rbp - 16], xmm0  # put sum into result.x

# result.y = a.y + b.y;
movss   xmm0, dword ptr [rbp - 28]  # put a.y in xmm0
movss   xmm1, dword ptr [rbp - 44]  # put b.y in xmm1
addss   xmm0, xmm1                  # add b.y to a.y in xmm0
movss   dword ptr [rbp - 12], xmm0  # put sum into result.y

# result.z = a.z + b.z;
movss   xmm0, dword ptr [rbp - 24]  # put a.z in xmm0
movss   xmm1, dword ptr [rbp - 40]  # put b.z in xmm1
addss   xmm0, xmm1                  # add b.z to a.z in xmm0
movss   dword ptr [rbp - 8], xmm0   # put sum into result.z

# result.w = a.w + b.w;
movss   xmm0, dword ptr [rbp - 20]  # put a.w in xmm0
movss   xmm1, dword ptr [rbp - 36]  # put b.w in xmm1
addss   xmm0, xmm1                  # add b.w to a.w in xmm0
movss   dword ptr [rbp - 4], xmm0   # put sum into result.w

# Put result into xmm0 and xmm1
movsd   xmm0, qword ptr [rbp - 16]  # result.x, result.y
movsd   xmm1, qword ptr [rbp - 8]   # result.z, result.w

Total Instructions: 22

To my eyes, even the assembly is more readable in this form. We see a clear pattern: load, load, add, store. Each add is four instructions this time instead of six, and the loads aren’t happening from memory but from the stack itself.

There could be performance benefits here in two forms: fewer instructions (26 vs 31) and no reads from memory. Because all of the operations are happening with values that are on the stack, we don’t have to worry about the latency involved with reading from memory.

Although I’m not sure why the compiler decided to use four xmm registers when only two were needed (each can hold four floats).

What if we turn optimizations on?

Pass by Pointer (-O2)

1
2
3
4
movups  xmm0, xmmword ptr [rsi]  # put a in xmm0
movups  xmm1, xmmword ptr [rdx]  # put b in xmm1
addps   xmm1, xmm0               # add a and b
movups  xmmword ptr [rdi], xmm1  # put sum into out

Total Instructions: 4

The compiler realizes that a and b are four floats each which happens to be the size of the xmm registers, so it saves a lot of effort by simply placing the four floats of a into xmm0 and the four floats of b into xmm1 and doing a single add instruction that takes care of the four individual adds.

Surely passing by value can’t beat that?

Pass by Value (-O2)

1
2
addps   xmm0, xmm2
addps   xmm1, xmm3

Total Instructions: 2

It doesn’t need to load anything from memory so it’s able to do everything within the registers themselves, for a total of three instructions. I’m not sure why it needs two add instructions though while the other version only requires the one.

Large Struct

The previous example had the benefit that the struct only contained four floats totaling sixteen bytes, and it also had the advantage that an entire vec4 could fit into a single xmm register. Let’s try another common game math structure: the 4x4 matrix.

1
2
3
4
5
6
typedef struct {
	float e00; float e01; float e02; float e03;
	float e10; float e11; float e12; float e13;
	float e20; float e21; float e22; float e23;
	float e30; float e31; float e32; float e33;
} mat4;

For our example function, let’s do something a bit contrived and do matrix addition which I’ve never needed to use but it’ll keep the assembly shorter than something like matrix multiplication.

The pass-by-pointer form would look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
void
mat4_add(mat4* out, mat4* a, mat4* b)
{
	out->e00 = a->e00 + b->e00;
    out->e01 = a->e01 + b->e01;
    out->e02 = a->e02 + b->e02;
    out->e03 = a->e03 + b->e03;

	out->e10 = a->e10 + b->e10;
    out->e11 = a->e11 + b->e11;
    out->e12 = a->e12 + b->e12;
    out->e13 = a->e13 + b->e13;

	out->e20 = a->e20 + b->e20;
    out->e21 = a->e21 + b->e21;
    out->e22 = a->e22 + b->e22;
    out->e23 = a->e23 + b->e23;

	out->e30 = a->e30 + b->e30;
    out->e31 = a->e31 + b->e31;
    out->e32 = a->e32 + b->e32;
    out->e33 = a->e33 + b->e33;
}

And the pass-by-value form:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mat4
mat4_add(mat4 a, mat4 b)
{
	mat4 result;

	result.e00 = a.e00 + b.e00;
    result.e01 = a.e01 + b.e01;
    result.e02 = a.e02 + b.e02;
    result.e03 = a.e03 + b.e03;

	result.e10 = a.e10 + b.e10;
    result.e11 = a.e11 + b.e11;
    result.e12 = a.e12 + b.e12;
    result.e13 = a.e13 + b.e13;

	result.e20 = a.e20 + b.e20;
    result.e21 = a.e21 + b.e21;
    result.e22 = a.e22 + b.e22;
    result.e23 = a.e23 + b.e23;

	result.e30 = a.e30 + b.e30;
    result.e31 = a.e31 + b.e31;
    result.e32 = a.e32 + b.e32;
    result.e33 = a.e33 + b.e33;

	return result;
}

Pass by Pointer (-O0)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# Put the three pointers onto the stack
mov     qword ptr [rbp - 8], rdi   # out
mov     qword ptr [rbp - 16], rsi  # a
mov     qword ptr [rbp - 24], rdx  # b

# result.e00 = a.e00 + b.e00;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax], xmm0

# result.e01 = a.e01 + b.e01;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 4]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 4]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 4], xmm0

# result.e02 = a.e02 + b.e02;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 8]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 8]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 8], xmm0

# result.e03 = a.e03 + b.e03;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 12]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 12]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 12], xmm0

# result.e10 = a.e10 + b.e10;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 16]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 16]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 16], xmm0

# result.e11 = a.e11 + b.e11;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 20]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 20]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 20], xmm0

# result.e12 = a.e12 + b.e12;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 24]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 24]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 24], xmm0

# result.e13 = a.e13 + b.e13;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 28]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 28]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 28], xmm0

# result.e20 = a.e20 + b.e20;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 32]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 32]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 32], xmm0

# result.e21 = a.e21 + b.e21;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 36]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 36]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 36], xmm0

# result.e22 = a.e22 + b.e22;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 40]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 40]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 40], xmm0

# result.e23 = a.e23 + b.e23;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 44]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 44]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 44], xmm0

# result.e30 = a.e30 + b.e30;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 48]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 48]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 48], xmm0

# result.e31 = a.e31 + b.e31;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 52]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 52]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 52], xmm0

# result.e32 = a.e32 + b.e32;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 56]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 56]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 56], xmm0

# result.e33 = a.e33 + b.e33;
mov     rax, qword ptr [rbp - 16]
movss   xmm0, dword ptr [rax + 60]
mov     rax, qword ptr [rbp - 24]
addss   xmm0, dword ptr [rax + 60]
mov     rax, qword ptr [rbp - 8]
movss   dword ptr [rax + 60], xmm0

Total Instructions: 98

It’s about the same as the vec4 version except that there are four times as many add sections because there are four times as many floats involved. There really isn’t any difference otherwise. It still places the pointers onto the stack frame and then loads the values from memory into registers, adds, and places the values back into memory.

Pass by Value (-O0)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Store pointers to the stack in registers (for easier offsets I assume)
mov     rax, rdi
lea     rcx, [rbp + 80] # b
lea     rdx, [rbp + 16] # a

# result.e00 = a.e00 + b.e00;
movss   xmm0, dword ptr [rdx]
addss   xmm0, dword ptr [rcx]
movss   dword ptr [rdi], xmm0

# result.e01 = a.e01 + b.e01;
movss   xmm0, dword ptr [rdx + 4]
addss   xmm0, dword ptr [rcx + 4]
movss   dword ptr [rdi + 4], xmm0

# result.e02 = a.e02 + b.e02;
movss   xmm0, dword ptr [rdx + 8]
addss   xmm0, dword ptr [rcx + 8]
movss   dword ptr [rdi + 8], xmm0

# result.e03 = a.e03 + b.e03;
movss   xmm0, dword ptr [rdx + 12]
addss   xmm0, dword ptr [rcx + 12]
movss   dword ptr [rdi + 12], xmm0

# result.e10 = a.e10 + b.e10;
movss   xmm0, dword ptr [rdx + 16]
addss   xmm0, dword ptr [rcx + 16]
movss   dword ptr [rdi + 16], xmm0

# result.e11 = a.e11 + b.e11;
movss   xmm0, dword ptr [rdx + 20]
addss   xmm0, dword ptr [rcx + 20]
movss   dword ptr [rdi + 20], xmm0

# result.e12 = a.e12 + b.e12;
movss   xmm0, dword ptr [rdx + 24]
addss   xmm0, dword ptr [rcx + 24]
movss   dword ptr [rdi + 24], xmm0

# result.e13 = a.e13 + b.e13;
movss   xmm0, dword ptr [rdx + 28]
addss   xmm0, dword ptr [rcx + 28]
movss   dword ptr [rdi + 28], xmm0

# result.e20 = a.e20 + b.e20;
movss   xmm0, dword ptr [rdx + 32]
addss   xmm0, dword ptr [rcx + 32]
movss   dword ptr [rdi + 32], xmm0

# result.e21 = a.e21 + b.e21;
movss   xmm0, dword ptr [rdx + 36]
addss   xmm0, dword ptr [rcx + 36]
movss   dword ptr [rdi + 36], xmm0

# result.e22 = a.e22 + b.e22;
movss   xmm0, dword ptr [rdx + 40]
addss   xmm0, dword ptr [rcx + 40]
movss   dword ptr [rdi + 40], xmm0

# result.e23 = a.e23 + b.e23;
movss   xmm0, dword ptr [rdx + 44]
addss   xmm0, dword ptr [rcx + 44]
movss   dword ptr [rdi + 44], xmm0

# result.e30 = a.e30 + b.e30;
movss   xmm0, dword ptr [rdx + 48]
addss   xmm0, dword ptr [rcx + 48]
movss   dword ptr [rdi + 48], xmm0

# result.e31 = a.e31 + b.e31;
movss   xmm0, dword ptr [rdx + 52]
addss   xmm0, dword ptr [rcx + 52]
movss   dword ptr [rdi + 52], xmm0

# result.e32 = a.e32 + b.e32;
movss   xmm0, dword ptr [rdx + 56]
addss   xmm0, dword ptr [rcx + 56]
movss   dword ptr [rdi + 56], xmm0

# result.e33 = a.e33 + b.e33;
movss   xmm0, dword ptr [rdx + 60]
addss   xmm0, dword ptr [rcx + 60]
movss   dword ptr [rdi + 60], xmm0

Total Instructions: 51

The result here is a bit interesting. It first takes the stack addresses of a and b and places them into the registers rdx and rcx. It then uses offsets from rdx and rcx to access the values rather than offsets from the stack frame pointer (rbp) itself.

Also, each add is now three instructions which is one fewer than the pass-by-value vec4_add. Here it’s able to move one value into xmm0 and then perform the add directly from the value on the stack.

So the pass-by-pointer version is 102 instructions while the pass-by-value version is only 55.

But what about with optimizations?

Pass by Pointer (-O2)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Add four floats
movups  xmm0, xmmword ptr [rsi]
movups  xmm1, xmmword ptr [rdx]
addps   xmm1, xmm0
movups  xmmword ptr [rdi], xmm1

# Add four floats
movups  xmm0, xmmword ptr [rsi + 16]
movups  xmm1, xmmword ptr [rdx + 16]
addps   xmm1, xmm0
movups  xmmword ptr [rdi + 16], xmm1

# Add four floats
movups  xmm0, xmmword ptr [rsi + 32]
movups  xmm1, xmmword ptr [rdx + 32]
addps   xmm1, xmm0
movups  xmmword ptr [rdi + 32], xmm1

# Add four floats
movups  xmm0, xmmword ptr [rsi + 48]
movups  xmm1, xmmword ptr [rdx + 48]
addps   xmm1, xmm0
movups  xmmword ptr [rdi + 48], xmm1

Total Instructions: 16

Similarly to the vec4 version, it’s able to add in four float chunks which reduces the number of instructions significantly, but it does still need to load from memory which slows things a bit.

Pass by Value (-O2)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Place return value into rax
mov     rax, rdi

# Add four floats
movaps  xmm0, xmmword ptr [rsp + 8]
addps   xmm0, xmmword ptr [rsp + 72]
movups  xmmword ptr [rdi], xmm0

# Add four floats
movaps  xmm0, xmmword ptr [rsp + 24]
addps   xmm0, xmmword ptr [rsp + 88]
movups  xmmword ptr [rdi + 16], xmm0

# Add four floats
movaps  xmm0, xmmword ptr [rsp + 40]
addps   xmm0, xmmword ptr [rsp + 104]
movups  xmmword ptr [rdi + 32], xmm0

# Add four floats
movaps  xmm0, xmmword ptr [rsp + 56]
addps   xmm0, xmmword ptr [rsp + 120]
movups  xmmword ptr [rdi + 48], xmm0

Total Instructions: 13

Again, it’s able to add in chunks of four floats but there is less overhead because everything is happening within the stack frame.

Conclusion

After doing these experiments I’m convinced that in most cases (on a PC with a modern x86 CPU) passing by value is the way to go, in regards to both readability and performance.

Passing by pointer leads to reads from memory which can be slow and often require additional instructions.

Passing by pointer can cause ambiguity about ownership because anything with the pointer can do whatever it wants with the data.

Passing by pointer can lead to pointer aliasing where a compiler can’t be sure they don’t point to the same data and so can’t perform certain optimizations.

Passing by value operates on the stack and prevents the need to fetch from memory.

Passing by value makes for more readable function calls as you clearly see that the inputs are the function arguments and the output is the return value from the function.

Passing by value makes a variable const by default.

That’s not to say I never pass by pointer. If I have a function that needs to modify an existing piece of data then that is a good use for a pointer. It wouldn’t make sense to take in a copy, modify the copy, and return a new copy. Or if a struct were truly huge (hundreds of bytes or more), then I would use a pointer, or at the very least check the compiled assembly to see what it was doing.

But for a function that needs to only read from a struct, and/or return a brand new struct, I think it makes to sense pass by value.

Last Edited: Dec 20, 2022