It is near impossible to use the internet without encountering a proxy of some kind. For example, most large-scale applications use load balancers, where incoming requests are intercepted by a reverse proxy and routed to servers based on availability. When you sign into a captive portal on a public WiFi network, your traffic is typically routed through some kind of proxy or gateway server to prevent you from accessing the network while you complete a login process. And then there's VPNs or Virtual Private Networks, which are becoming more and more popular with growing concerns of digital privacy and tracking.^1
But let's step back for a second. What is a proxy? On a broad level, we can define a proxy as an intermediary service that performs actions on the behalf of two or more connected hosts. For example, a forward proxy (as the name suggests) forwards traffic from a client to an end destination.
Let's try to build a very forward simple proxy with the following specs:
Sockets are kernel-managed abstractions that easily allow programs to communicate across networks. For a full description of sockets, you can check out my previous post. Our program will need to open three sockets. One for the server listening for incoming connections, one for the accepted client connection, and one for the proxy service sending and receiving the outgoing request to/from the end destination.
Let's start with the server socket. Our program needs to create a socket, bind it to a local address, listen for connections, and define logic on how to accept a new connection.
#define PORT 5050
int main()
{
int server_sock = socket(AF_INET, SOCK_STREAM, 0);
if (server_sock < 0)
{
perror("Error creating server socket\n");
printf("%d\n", server_sock);
close(server_sock);
exit(EXIT_FAILURE);
}
struct sockaddr_in server_address;
server_address.sin_family = AF_INET;
server_address.sin_port = htons(PORT);
server_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
if (bind(server_sock, (struct sockaddr *)&server_address, sizeof(server_address)) < 0)
{
perror("Error binding server socket\n");
close(server_sock);
exit(EXIT_FAILURE);
}
...
}
In the above code, we've defined a TCP socket and bound it to a local IPv4 loopback address and port. Now we can listen for connections
if (listen(server_sock, 1) < 0)
{
perror("Failed to listen on server socket\n");
close(server_sock);
exit(EXIT_FAILURE);
}
The second argument of the listen
function specifies the number of incoming connections to backlog. Since we aren't really worried about our proxy being flooded by requests, we can just accept one connection for now.
When a request hits the proxy, we'll create a new client socket and then do something with it.
int client_socket;
struct sockaddr_in client_address;
socklen_t client_len = sizeof(client_address);
client_sock = accept(server_sock, (struct sockaddr *)&client_address, &client_len);
if (client_sock < 0)
{
perror("Failed to accept incoming client connection\n");
close(client_sock);
close(server_sock);
exit(EXIT_FAILURE);
}
printf("Client connected: %s\n", inet_ntoa(client_address.sin_addr));
handle_client(client_sock); // TODO
Our program creates the new client socket with the accept
call. If nothing goes wrong, we can now read and write to and from the socket. Let's get parsing!
All our proxy needs to know for now is the hostname and port number of the intended destination. For example, consider the following request:
GET http://example.com:8080
The hostname of the request is example.com
and the port will be 8080
. If no port is specified, we'll use the HTTP default port of 80
. The above request translates into the following plaintext:
GET / HTTP/1.1
Host: example.com:80
User-Agent: Curl/8.6.0
Though we just need the host and port number, we can build a basic parser that we can extend later if we need to. Manipulating and searching text is a little trickier than higher level languages, but we won't let that stop us.
To begin, let's define a struct where we can store our variables for later:
typedef struct {
char method[16];
char host[256];
int port;
char path[1024];
} HttpRequest;
Next, let's make a new function with an argument for the struct we will manipulate, and another for the raw request string:
int parse_http_request(const char *raw_request, HttpRequest *http_request)
{
memset(http_request, 0, sizeof(HttpRequest));
char method[10] = {0};
char path[1024] = {0};
char host[256] = {0};
int port = 80;
...
}
To avoid memory corruption and issues parsing, we can create a copy of the raw request:
char *request_copy = strdup(raw_request);
Now, for the parsing. Our objective is to split the request into individual lines. Then, on each line we can pluck out the value we want. For this we can use the strtok_r
method, a reentrant version of strtok
for thread safety:
char *strtok_r(char *restrict str, const char *restrict sep, char **restrict lasts);
This function allows us to tokenize a string based on a delimiter. The first argument is a pointer to the string (i.e. a pointer to the first character), the second argument is the delimiter and the last argument is a user-defined pointer that directs us to the character found after the delimiter is satisfied. If the delimiter, or separator character, is found the result of the function will be a pointer to the first string instance before the character is found. If nothing is found, the result will be NULL
. If we want to continue searching the same string, we set subsequent calls with NULL
in the first argument. Kind of confusing and weird, but a simple example will help:
char str[] = "foo,bar,baz";
char *token;
char *saveptr;
token = strtok_r(str, ",", &saveptr);
while (token != NULL) {
printf("%s\n", token);
token = strtok_r(NULL, ",", &saveptr);
}
In the snippet above, we want to separate our string by the comma (,
) character. The first call to strtok_r
includes the original string and will return a valid pointer since the string includes at least one of our delimiter character. We'll continue this operation in a loop until no more commas remain, printing out the following:
gcc strtok.c
./a.out
foo
bar
baz
Now on to our request. HTTP specifies that each new line ends with \r\n
characters^2. So, we'll split up the request by searching for this pattern as our delimiter. Then, on each line, we can search for whatever values we want.
// parse first line
char *line = strtok_r(request_copy, "\r\n", &line_ptr);
if (line) {
char *token = strtok_r(line, " ", &token_ptr);
if (token) {
// method
strncpy(method, token, sizeof(method) - 1);
token = strtok_r(NULL, " ", &token_ptr);
if (token) {
strncpy(path, token, sizeof(path) - 1);
}
// parse headers
while ((line = strtok_r(NULL, "\r\n", &line_ptr)) != NULL) {
// look for Host header
if (strncasecmp(line, "Host:", 5) == 0) {
char *host_value = line + 5;
// skip whitespace
while (isspace(*host_value)) host_value++;
// parse host and port if present
char *colon = strchr(host_value, ':');
if (colon) {
// host with port
size_t host_len = colon - host_value;
strncpy(host, host_value, host_len);
host[host_len] = '\0';
port = atoi(colon + 1);
} else {
// host without port
strncpy(host, host_value, sizeof(host) - 1);
}
break;
}
}
}
We break up each line by the \r\n
delimiter. On each line we can manually search for values like the Host
header with the strncasecmp
method, the port preceding the host by searching for a proceeding colon. These are the only values we need for now, but in the future we might add more logic to pluck out more header values, cookies, etc.
Now that we can determine the final intended destination of the HTTP request, our proxy service can make that request on behalf of the client. This is fairly simple and will involve the following:
In our handle_client
function we can first parse what we need:
void handle_client(int client_socket)
{
char buffer[1024];
ssize_t read_size;
read_size = recv(client_socket, buffer, sizeof(buffer) - 1, 0);
if (read_size <= 0)
{
perror("Error receiving client message\n");
close(client_socket);
return;
}
buffer[read_size] = '\0';
HttpRequest request;
if (parse_http_request(buffer, &request) < 0)
{
printf("parse unsuccessfull\n");
close(client_socket);
return;
}
...
}
We'll use the recv
function to get the initial HTTP request, which will store it in our buffer
. We'll then parse the contents of the buffer in order to extract the host and port.
In order to translate a host to an IP address we can use getaddrinfo
, which is build into unix-like operating systems. From the man
pages:
The getaddrinfo() function is used to get a list of IP addresses and port numbers for host hostname and service servname.
We provide the function with a hostname and a struct for where to store the address once it has been resolved. We can also specify we want a IPv4 address.
struct addrinfo hints, *res;
...
hints.ai_socktype = SOCK_STREAM; // TCP socket
hints.ai_family = AF_INET; // Use AF_INET for IPv4
getaddrinfo(request.host, NULL, &hints, &res);
If the call succeeds, then the hostname has been resolved and we can access it with res->ai_addr
.
Finally, using the address, we create the proxy socket, configured with the IP and port number:
int proxy_socket = socket(AF_INET, SOCK_STREAM, 0);
if (proxy_socket < 0)
{
perror("Error creating proxy socket\n");
close(proxy_socket);
return;
}
struct sockaddr_in proxy_address;
proxy_address.sin_family = AF_INET;
proxy_address.sin_port = htons(request.port);
proxy_address.sin_addr = res->ai_addr;
We can forward the original request to the proxy socket, and forward the reply from the end server back to the client:
if (connect(proxy_socket, (struct sockaddr *)&proxy_address, sizeof(proxy_address)) < 0)
{
perror("Proxy connection failed\n");
close(client_socket);
close(proxy_socket);
return;
}
send(proxy_socket, buffer, read_size, 0);
read_size = recv(proxy_socket, buffer, sizeof(buffer), 0);
send(client_socket, buffer, read_size, 0);
close(proxy_socket);
close(client_socket);
If all goes well, the client should see the response from the server. We can text this out using curl, which the argument for proxy set:
# proxy running at localhost 5100
$ curl -x 'localhost:5100' http://google.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
This looks good! This response looks identical to the one we'd get if we just hit http://google.com
directly. Our proxy works.
Unfortunetely our proxy is missing something quite important. Unsecured HTTP traffic is quite rare these days and most services enforce HTTPS. If we were to change our above curl request to:
$ curl -x 'localhost:5100' https://google.com
Our proxy would not work. This is because for HTTPS, a client connecting to a secured server via a proxy sends the following request:
CONNECT google.com HTTP/1.1
As opposed to a simple GET request:
GET google.com HTTP/1.1
The connect request is specific to HTTPS and is utilized when a client wants to initiate a secure tunnel between itself and the end server.
We will explore creating a HTTPS forward proxy, but for now we can return a 501 Not Implemented
response and let the tech debt accrue:
if (strcmp(request.method, "CONNECT") == 0)
{
printf("Client requesting secure tunnel via https\n");
// todo: implement https tunnel
char *response = "HTTP/1.1 501 Not Implemented\r\n\r\n";
write(client_socket, response, strlen(response));
close(client_socket);
return;
}
Lastly, our server is fairly rudimentary. Introducing multi-threaded request handling would be a nice optimization as we build out more features.
Find the full code for this post here.