8-May-2016: C/C++ pointers: yet another short example

The note below has been copypasted to the Reverse Engineering for Beginners book

(For those who have a hard time in understanding C/C++ pointers).

Pointer is just an address in memory. But why we write "char* string" instead of something like "address string"? Pointer variable is supplied with a type of the value to which pointer points. So then compiler will able to check bugs in compilation time.

To be pedantic, data typing in programming languages is all about preventing bugs and self-documentation. It's possible to use maybe two of data types like int (or int64) and byte - these are the only types which are available to assembly language programmers. But it's just very hard task to write big and practical assembly programs without nasty bugs. Any small typo can lead to hard-to-find bug.

Data type information is absent in a compiled code (and this is one of the main problems for decompilers), and I can demonstrate this:

This is what sane C/C++ programmer can write:

#include <stdio.h>
#include <stdint.h>

void print_string (char *s)
{
	printf ("(address: 0x%llx)\n", s);
	printf ("%s\n", s);
};

int main()
{
	char *s="Hello, world!";

	print_string (s);
};
This is what I can write ("Do not try this at home" ("MythBusters")):
#include <stdio.h>
#include <stdint.h>

void print_string (uint64_t address)
{
	printf ("(address: 0x%llx)\n", address);
	puts ((char*)address);
};

int main()
{
	char *s="Hello, world!";

	print_string ((uint64_t)s);
};

I use uint64_t because I run this example on Linux x64. int would work for 32-bit OS-es.

First, a pointer to character (the very first in the greeting string) is casted to uint64_t, then it's passed. print_string() function casts back incoming uint64_t value into pointer to a character.

What is interesting is that GCC 4.8.4 produces identical assembly output for both versions:

gcc 1.c -S -masm=intel -O3 -fno-inline
.LC0:
	.string	"(address: 0x%llx)\n"
print_string:
	push	rbx
	mov	rdx, rdi
	mov	rbx, rdi
	mov	esi, OFFSET FLAT:.LC0
	mov	edi, 1
	xor	eax, eax
	call	__printf_chk
	mov	rdi, rbx
	pop	rbx
	jmp	puts
.LC1:
	.string	"Hello, world!"
main:
	sub	rsp, 8
	mov	edi, OFFSET FLAT:.LC1
	call	print_string
	add	rsp, 8
	ret

(I've removed all insignificant GCC directives).

I also tried diff utility and it shows no differences at all.

Let's continue to abuse C/C++ programming traditions heavily. Someone may write this:

#include <stdio.h>
#include <stdint.h>

uint8_t load_byte_at_address (uint8_t* address)
{
	return *address;
	//this is also possible: return address[0]; 
};

void print_string (char *s)
{
	char* current_address=s;
	while (1)
	{
		char current_char=load_byte_at_address(current_address);
		if (current_char==0)
			break;
		printf ("%c", current_char);
		current_address++;
	};
};

int main()
{
	char *s="Hello, world!";

	print_string (s);
};

It can be rewritten like this:

#include <stdio.h>
#include <stdint.h>

uint8_t load_byte_at_address (uint64_t address)
{
	return *(uint8_t*)address;
	//this is also possible: return address[0]; 
};

void print_string (uint64_t address)
{
	uint64_t current_address=address;
	while (1)
	{
		char current_char=load_byte_at_address(current_address);
		if (current_char==0)
			break;
		printf ("%c", current_char);
		current_address++;
	};
};

int main()
{
	char *s="Hello, world!";

	print_string ((uint64_t)s);
};

Both source codes resulting in the same assembly output:

gcc 1.c -S -masm=intel -O3 -fno-inline
load_byte_at_address:
	movzx	eax, BYTE PTR [rdi]
	ret
print_string:
.LFB15:
	push	rbx
	mov	rbx, rdi
	jmp	.L4
.L7:
	movsx	edi, al
	add	rbx, 1
	call	putchar
.L4:
	mov	rdi, rbx
	call	load_byte_at_address
	test	al, al
	jne	.L7
	pop	rbx
	ret
.LC0:
	.string	"Hello, world!"
main:
	sub	rsp, 8
	mov	edi, OFFSET FLAT:.LC0
	call	print_string
	add	rsp, 8
	ret

(I have also removed all insignificant GCC directives).

No difference: C/C++ pointers are essentially addresses, but supplied with type information, in order to prevent possible mistakes at the time of compilation. Types are not checked during runtime - it would be huge (and unneeded) overhead.


My other blog posts about C/C++ pointers: [ 1 | 2 | 3 | 4 | 5 ]


This open sourced site and this page in particular is hosted on GitHub. Patches, suggestions and comments are welcome.


→ [list of blog posts, my twitter/facebook]

The page last updated on 09-October-2016