Understanding HAProxy’s reqrep

I’ve seen a lot of StackOverflow posts telling people to use reqrep and reqirep (the case insensitive version) , but they never really explain what is actually happening. The documentation for reqirep states:

Replace a regular expression with a string in an HTTP request line

Ok, but what is the HTTP request line? And how do you test your regular expression without having to reload Haproxy over and over again?

To see what a request line looks at, you’ll need netcat and curl, if you are on a linux box, curl is probably already installed, and netcat can be installed with yum install nc. Once you have netcat installed, open two terminals, in one terminal run nc -l 9090 to make netcat listen for incoming connections on port 9090, and then in the other terminal run curl http://localhost:9090. On the netcat terminal you shoudl see:

GET / HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.64.1
Accept: /

That is the request line you are going to be editing with the reqrep command. Reqrep will step through that line be line using a regular expression to find and then replace content. For example let us replace the Host line with something else. Take the output from the netcat command, and past it to a file, so we can run some tests on it. Let’s replace the Host line so instead of coming from localhost:9090, it is coming from mydomain.com:

cat output  | sed "s/^Host:\(.*\):\(.*\)/Host: mydomain:\2/g"

What is that doing? well it’s matching a line that starts with Host: and then has some characters a colon, and then some more characters till the end of the line. The parens indicate that we want to save whatever is matched inside that region to a variable, the first match is saved to \1 and the second match to \2. Next we have the re-write part of the expression, here we are saying write out Host: mydomain:\2 with \2 being the part that matched the port number (9090)

I should note, that the regex used above in sed, is not the same format that haproxy uses, but they are close enough that you can test your expression with sed, before converting them over to be used by haproxy.

How would you rewrite the url that is being asked for (currently just /)?

reqirep GET\ (/.*)(HTTP.*) Get\ /newUrl\ \2

In the example above we are searching for the GET line, and replace the URL with /newurl. Note that wherever you need a space you need to escape it, since the reqirep requires 2 arguments (search and replace) and if you have un-escaped spaces, it counts it as another argument.

A better tester

This got me to thinking that it would be pretty nice if I had a tool that could take in regular expression from haproxy, and then show me what it is doing in real time, so I started working on a python program that works like haproxy, but shows you what is being modified. This is very much a work in progress, but it does work on the simple examples I’ve passed through it:

#!/usr/bin/env python

import socket
import sys
import re

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

# [match,replace]
regexs = [[r'^(.*)\ /assets/billy/(.*)', r'\1\ /billy-page/\2']]

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# don't wait for socket if it is time_wait
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

s.bind(("", 8080))


while 1:
  print("waiting for connection")
  clientsocket, clientaddress = s.accept()
  while 1:
    request = clientsocket.recv(1024000)
    header += request.decode('utf-8')
    # print(header)

  # apply our regexes to these lines
  print("Check for:", regexs[0])
  for line in header.split("\n"):
    for r in regexs:
      if re.match( r[0], line):
        print(bcolors.OKGREEN, "line MATCH:", "::", r[0], line, bcolors.ENDC )
        print(bcolors.OKGREEN, "Repalce with:",  r[1], "::", re.sub( r[0], r[1].replace('\ ', ' '), line ), bcolors.ENDC)
    if not match:
      print(bcolors.OKBLUE,  line, bcolors.ENDC)

  # send it to the remote server and get a response 
  c = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  c.connect(("www.google.com", 80))
  data = c.recv(1024)
  print( data.decode('utf-8'))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s