mildbyte's notes tagged 'betfair'

(notes) / (tags) / (streams)

View: compact / full
Time: backward / forward
Source: internal / external / all
Public RSS / JSON

(show contents)

project Betfair, part 8

@mildbyte 1 year, 10 months ago | programming | python | betfair | scala |

Introduction

Previously on project Betfair, we gave up on market making and decided to move on to a simpler Betfair systematic trading idea that seemed to work.

Today on project Betfair, we'll trade it live and find out it doesn't really work.

DAFBot on greyhounds: live trading

With the timezone bug now fixed, I was ready to let my bot actually trade some greyhound races for a while. I started trading with the following parameters:

  • Favourite selection time: 180s
  • Entry order time: 160s
  • Exit time: 60s
  • Bet size: £5

I decided to pick the favourite closer to the trade since I thought that if it were picked 5 minutes before the race starting, it could change and there was no real point in locking it in so long before the actual trade. The 60s exit point was mostly to give me a margin of safety to liquidate the position manually if something happened as well as in case of the market going in-play before the advertised time (there's no in-play trading in greyhound markets, so I'd carry any exposure into the game. At that point, it would become gambling and not trading).

Results

So how did it do?

Well, badly for now. Over the course of 25 races, in 4 hours, it lost £5.93: an average of -£0.24 per race with a standard deviation of £0.42. That was kind of sad and slightly suspicious too: according to the fairly sizeable dataset I had backtested this strategy on, its return was, on average, £0.07 with a standard deviation of £0.68.

I took the empirical CDF of the backtested distribution of returns, and according to it, getting a return of less than -£5.93 with 25 samples had a probability of 1.2%. So something clearly was going wrong, either with my backtest or with the simulation.

I scraped the stream market data out of the bot's logs and ran the backtest just on those races. Interestingly, it predicted a return of -£2.70. What was going wrong? I also scraped the traded runners, the entry and the exit prices from the logs and from the simulation to compare. They didn't match! A few times the runner that the bot was trading was different, but fairly often the entry odds that the bot got were lower (recall that the bot wants to be long the market, so entering at lower odds (higher price/implied probability) is worse for it). Interestingly, there was almost no mismatch in the exit price: the bot would manage to close its position in one trade without issues.

After looking at the price charts for a few races, I couldn't understand what was going wrong. The price wasn't swinging wildly to fool the bot into placing orders at lower odds: in fact, the price 160s before the race start was just different from what the bot was requesting.

Turns out, it was yet another dumb mistake: the bot was starting to trade 150s before the race start and pick the favourite at that point as well. Simulating what the bot did indeed brought the backtest's PnL (on just those markets) within a few pence from the realised PnL.

So that was weird: moving the start time by 10 seconds doubled the loss on that dataset (by bringing it from -£2.70 to -£5.93).

Greyhound capacity analysis

There was another issue, though: the greyhound markets aren't that liquid.

While there is about £10000-£15000 worth of bets available to match against in an average greyhound race, this also includes silly bets (like offering to lay at 1000.0).

To demonstrate this better, I added market impact to the backtester: even assuming that the entry bet gets matched 160s before the race (which becomes more difficult to believe at higher bet amounts, given that the average total matched volume by that point is around £100), closing the bet might not be possible to do completely at one odds level: what if there isn't enough capacity available at that level and we have to place another lay bet at higher odds?

Here's some code that simulates that:

def get_long_return(lines, init_cash, startT, endT, suspend_time,
    market_impact=True, cross_spread=False):

    # lines is a list of tuples: (timestamp, available_to_back,
    #    available_to_lay, total_traded)
    # available to back/lay/traded are dictionaries
    #   of odds -> availablility at that level

    # Get start/end availabilities
    start = get_line_at(lines, suspend_time, startT)
    end = get_line_at(lines, suspend_time, endT)

    # Calculate the inventory 
    # If we cross the spread, use the best back odds, otherwise assume we get
    # executed at the best lay
    if cross_spread:
        exp = init_cash * max(start[1])
    else:
        exp = init_cash * min(start[2])

    # Simulate us trying to sell the contracts at the end
    final_cash = 0.

    for end_odds, end_avail in sorted(end[2].iteritems()):

        # How much inventory were we able to offload at this odds level?
        # If we don't simulate market impact, assume all of it.
        mexp = min(end_odds * end_avail, exp) if market_impact else exp
        exp -= mexp

        # If we have managed to sell all contracts, return the final PnL.
        final_cash += mexp / end_odds
        if exp < 1e-6:
            return final_cash - init_cash

    # If we got to here, we've managed to knock out all price levels 
    # in the book.
    return final_cash - init_cash

I then did several simulations of the strategy at different bet sizes.

Turns out, as we increase the bet size away from just £1, the PnL quickly decays (the vertical lines are the error bars, not the standard deviations). For example, at bet size of £20, the average return per race is just £0.30 with a standard deviation of about £3.00 and a standard error of £0.17.

DAFBot on horses

At that point, I had finally managed to update my new non-order-book simulator so that it could work on horse racing data, which was great, since horse markets were much more preferable to greyhound ones: they were more liquid and there was much less opportunity for a single actor to manipulate prices. Hence there would be more capacity for larger bet sizes.

In addition, given that the spreads in horses are much tighter, I wasn't worried about having a bias in my backtests (the greyhound one assumes we can get executed at the best lay, but most of its PnL could have come from the massive back-lay spread at 160s before the race, despite that I limited the markets in the backtest to those with spreads below 5 ticks).

Research

I backtested a similar strategy on horse data but, interestingly enough, it didn't work: the average return was well within the standard error from 0.

However, flipping the desired position (instead of going long the favourite, betting against it) resulted in a curve similar to the one for the greyhound strategy. In essence, it seemed as if there was an upwards drift in the odds on the favourite in the final minutes before the race. Interestingly, I can't reproduce those results with the current, much larger, dataset that I've gathered (even if I limit the data to what I had at that point), so the following results might be not as exciting.

The headline number, according to my notes at that time, was that, on average, with £20 lay bets, entering at 300 seconds before the race and exiting at 130 seconds, the return was £0.046 with a standard error of £0.030 and a standard deviation of £0.588. This seems like very little, but the £20 bet size would be just a start. In addition, there are about 100 races every day (UK + US), hence annualizing that would result in a mean of £1690 and a standard deviation of £112.

This looked nice (barring the unrealistic Sharpe ratio of 15), but the issue was that it didn't scale well: at bet sizes of £100, the annualized mean/standard deviation would be £5020 and £570, respectively, and it would get worse further on.

I also had found out that, at £100 bet sizes, limiting the markets to just those operating between 12pm and 7pm (essentially just the UK ones) gave better results, despite that the strategy would only be able to trade 30 races per day. The mean/standard deviation were £4220 and £310, respectively: a smaller average return and a considerably smaller standard deviation. This was because the US markets were generally thinner and the strategy would crash through several price levels in the book to liquidate its position.

Note this was also using bet sizes and not exposures: so to place a lay of £100 at, say, 4.0, I would have to risk £300. I didn't go into detailed modelling of how much money I would need deposited to be able to trade this for a few days, but in any case I wasn't ready to trade with stakes that large.

Placing orders below £2

One of the big issues with live trading the greyhound DAFBot was the fact that the bot can't place orders below £2. Even if it offers to buy (back), say, £10 at 2.0, only £2 of its offering could actually get matched. After that point, the odds could go to, say, 2.5, and the bot would now have to place a lay bet of £2 * 2.0 / 2.5 = £1.6 to close its position.

If it doesn't do that, it would have a 4-contract exposure to the runner that it paid £2 for (will get £4 if the runner wins for a total PnL of £2 or £0 if the runner doesn't win for a total PnL of -£2).

If it instead places a £2 lay on the runner, it will have a short exposure of 2 * 2.0 - 2 * 2.5 = -1 contract (in essence, it first has bought 4 contracts for £2 and now has sold 5 contracts for £2: if the runner wins, it will lose £1, and if the runner loses, it will win nothing). In any case, it can't completely close its position.

So that's suboptimal. Luckily, Betfair documents a loophole in the order placement mechanism that can be used to place orders below £2. They do say that it should only be used to close positions and not for normal trading (otherwise people would be trading with £0.01 amounts), but that's exactly our use case here.

The way it's done is:

  • Place an order above £2 that won't get matched (say, a back of 1000.0 or a lay of 1.01);
  • Partially cancel some of that order to bring its unmatched size to the desired amount;
  • Use the order replacement instruction to change to odds of that order to the desired odds.

Live trading

I started a week of fully automated live trading on 2nd October. That was before I implemented placing bets below £2 and the bot kind of got lucky on a few markets, unable to close its short exposure fully and the runner it was betting against losing in the end. That was nice, but not exactly intended. I also changed the bot to place bets based on a target exposure of 10 contracts (as opposed to stakes of £10, hence the bet size would be 10 / odds).

In total, the bot made £3.60 on that day after trading 35 races.

Things quickly went downhill after I implemented order placement below £2:

  • 3rd October: -£3.41 on 31 races (mostly due to it losing £3.92 on US markets after 6pm);
  • 4th October: increased bet size to 15 contracts; PnL -£2.59 on 22 races;
  • 5th October: made sure to exclude markets with large back-lay spreads from trading; -£2.20 on 25 races (largest loss -£1.37, largest win £1.10);
  • 6th October: -£3.57 on 36 races;
  • 7th October: -£1.10 on 23 races.

In total, the bot lost £9.27 over 172 races, which is about £0.054 per race. Looking at Betfair, the bot had made 395 bets (entry and exit, as well as additional exit bets at lower odds levels when there wasn't enough available at one level) with stakes totalling £1409.26. Of course, it wasn't risking more than £15 at any point, but turning over that much money without issues was still impressive.

What wasn't impressive was that it consistently lost money, contrary to the backtest.

Conclusion

At that point, I was slowly growing tired of Betfair. I played with some more ideas that I might write about later, but in total I had been at it for about 2.5 months and had another interesting project in mind. But project Betfair for now had to go on an indefinite hiatus.

To be continued...

Enjoyed this series? I do write about other things too, sometimes. Feel free to follow me on twitter.com/mildbyte or on this RSS feed! Alternatively, register on Kimonote to receive new posts in your e-mail inbox.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here or the beta here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 7

@mildbyte 1 year, 11 months ago | programming | python | betfair | scala |

Introduction

Previously on project Betfair, we ran the production market-making CAFBot in Scala, got it working, made some money, lost some money and came back with some new ideas.

Today, we'll test those ideas, look at something that's much more promising and learn a dark truth about timezones.

Sorry this took a bit too long to write, by the way, I've been spending some time working on Kimonote to add email digests to streams. The idea is that given some exporters from streams (e-mail, RSS (already works), Facebook, Twitter etc) with varying frequencies (immediately or weekly/daily digests) as well as some importers (again RSS/Twitter/Facebook/other blogging platforms) a user could create their own custom newsletter and get it sent to their mailbox (push) instead of wasting time checking all those websites (pull), as well as notify their social media circles when they put a new post up anywhere else. None of this works yet, but other features do — if you're interested, sign up for the beta here!

Shameless plug over, back to the world of automated trading goodness.

CAFBotV2 with wider spreads

Remember how in the real-world greyhound market the bot managed to have some of its bets matched despite that they were several ticks away from the best back and lay? I realised I never really tested that in simulation: I started from making the market at the top of the order book and kind of assumed that further away from there matching would never happen. Looks like I was wrong (and in fact in part 5 the bot had issues with its bets that were moved away from the current market because of high exposure getting matched anyway).

So I added (backported?) the levelBase parameter from the Scala bot into the research one: recall that it specified how far from the best back/lay the bot should start working before applying all the other offsets (exposure or order book imbalance). Hence at levelBase = 0 the bot would work exactly as before and with levelBase = 1 it would start 1 Betfair tick away from the best back/lay. levelBase = 3 is what was traded live on the greyhound market.

The idea behind this is kind of simple: if the bot still gets its bets matched even if it's far away from the best back/lay, it will earn a higher spread with fewer trades.

So, first, I ran it on our favourite horse market with levelBase = 1.

Results

Horses, levelBase = 1

It didn't do well: there were very few matches and so most of the time it just did nothing, trading a grand total of 3 times. This meant that it got locked into an inventory that it couldn't offload.

Let's run it on the whole dataset: though this market didn't work as well, in some other ones matching did happen in jumps larger than 1 tick, so those might be able to offset more liquid markets.

We're tantalizingly close to finally having a PnL of zero (the more observant reader might notice that we could have done the same by not trading at all). Let's see how it would have done on the greyhound markets, which we do know sometimes jump like crazy.

Greyhounds, levelBase = 3

Not very inspiring either. There's a large amount of markets where this wouldn't have done anything at all (since the bets were so far from the best back/lay, they don't ever get hit), and when something does happen, it seems to be very rare, so the bot can't liquidate its position and remains at the mercy of the market.

So while that was a cute idea, it didn't seem to work.

DAFBot

At this point, I was running out of ideas. The issue of the bot getting locked into an inventory while the market was trending against it still remained, so I had to look at the larger-scale patterns in the data: perhaps based on the bigger moves in the market, the bot could have a desired inventory it could aim for (instead of always targeting zero inventory).

Consider this: if we think that the market is going to move one way or another, it's okay for the bot to have an exposure that way and it can be gently guided towards it (by means of where it sets its prices). Like that, the bot would kind of become a hybrid of a slower trading strategy with a market maker: even if its large-scale predictions of price movements weren't as good, they would get augmented by the market making component and vice versa.

Carry?

I tried out a very dumb idea. Remember how in most of the odds charts we looked at the odds, for some reason, trended down? I had kind of noticed that, or at least I thought I did, and wanted to quantify it.

Those odds were always on the favourite (as in, pick the greyhound/horse with the lowest odds 1 hour before the race begins and see how they change). The cause could be that, say, people who wanted to bet on the favourite would delay their decision to see if any unexpected news arrived before the race start, which is the only sort of news that could move the market.

Whatever the unexpected news would be, they would likely affect the favourite negatively: they could be good for any of the other greyhounds/horses, thus making it more likely for them to win the race. Hence it would make sense, if someone wanted to bet on the favourite, for them to wait until just before the race begins to avoid uncertainty, thus pushing the odds down as the race approaches.

So what if we took the other side of this trade? If we were to go long the favourite early, we would benefit from this downwards slide in odds, at the risk of some bad news coming out and us losing money. I guessed this would be similar to a carry trade in finance, where the trader makes money if the market conditions stay the same (say, borrowing money in a lower interest rate currency and then holding it in a higher interest rate currency, hoping the exchange rate doesn't move). In essence, we'd get paid for taking on the risk of unexpected news about the race coming out.

Research: greyhounds

I had first started doing this using my order book simulator, but realised it would be overkill: if the only thing I wanted to do was testing a strategy that traded literally twice (once to enter the position, once to close it), it would be better write a custom scraper from the stream data that would get the odds' timeseries and simulate the strategy faster.

At that point, I realised the horse racing stream data was too large to fit into memory with the new simulator. So I put that on hold for a second and tried my idea on greyhound markets.

This chart plots, at each point in time before the race, the average return on going long (backing) the favourite and then closing our position 15s before the race begins. In any case, the favourite is picked 5 minutes before the race begins. The vertical lines are the error bars (not standard deviations). Essentially, what we have here is a really consistent way to lose money.

This is obviously because of the back-lay spreads: the simulation here assumes we cross the spread both when entering and exiting, in essence taking the best back in the beginning and the best lay at the end.

Remember this chart from part 4?

The average spread 120s before a greyhound race begins is about 5 ticks. We had previously calculated that the loss on selling 1 tick lower than we bought is about 0.5%, so no wonder we're losing about 3% of our investment straight away.

What if we didn't have to cross the spread?

Woah. This graph assumes that instead of backing the runner at the best back, we manage to back it at the current best lay (by placing a passive back at those odds). When we're exiting the position just before the race begins, time is of the essence and so we're okay with paying the spread and placing a lay bet at the odds of the current best lay (getting executed instantly).

The only problem is actually getting matched: since matching in greyhound markets starts very late (as much money gets matched in the last 80 seconds as does before), our bet could just remain in the book forever, or get matched much closer to the beginning of the race.

But here's the fun part: this graph doesn't care. It shows that if the bet is matched at whatever the best lay was 160 seconds before the race, on average this trade makes money — even if the actual match happens a minute later. If the bet doesn't get matched at all, the trade simply doesn't happen.

This does assume that the performance of this strategy is independent of whether or not the bet gets hit at all, but if that's not the case, we would have been able to use the fact that our bet got hit as a canary: when it gets hit, we know that being long this market is a good/bad thing and adjust our position accordingly.

Implementation in Scala

With that reasoning, I went to work changing the internals of Azura to write another core for it and slightly alter the way it ran. The algorithm would be:

  • Run with parameters: market, favourite selection time, entry start time, entry end time, exit time (all in seconds before the race start), amount to bet.
  • Subscribe to the market/order streams, get the market suspend time from the Betfair API
  • At favourite selection time: inspect our local order book cache, select the runner with the lowest odds
  • At entry start time: place a passive back at the current best lay odds of the given amount on the favourite.
  • At entry end time: cancel remaining unexecuted orders.
  • At exit time: if we have a position (as in our entry order got hit), unwind it aggressively.

I called the new core DAFBot (D stands for Dumb and what AF stands for can be gleaned from the previous posts). I wanted to reuse the idea of polling a strategy for the orders that it wished to offer and the core being stateless, since that would mean that if the bot crashed, I could restart it and it would proceed where it left off. That did mean simple actions like "buy this" became more difficult to encode: the bot basically had to look at its inventory and then say "I want to maintain an outstanding back bet for (how much I want to buy - how much I have)".

Finally, yet another Daedric prince got added to my collection: Sheogorath, "The infamous Prince of Madness, whose motives are unknowable" (I had given up on my naming scheme making sense by this point), would schedule instances of Azura to be run during the trading day by using the Betfair API to fetch a list of greyhound races and executing Azura several minutes before that.

Live trading

I obviously wasn't ready to get Sheogorath to execute multiple instances of Azura and start losing money at computer speed quite yet, so for now I ran the new strategy manually on some races, first without placing any bets (just printing them out) and then actually doing so.

The biggest issue was the inability to place bets below £2. I had thought this wouldn't be a problem (as I was placing entry bets with larger amounts), but fairly often only part of the offer would get hit, so the bot would end up having an exposure that it wasn't able to close (since closing it would entail placing a lay bet below £2). Hence it took some of that exposure into the race, which wasn't good.

Timing bug?

In addition, when testing Sheogorath's scheduling (by getting it to kick off instances of Azura that didn't place bets), I noticed a weird thing: Sheogorath would start Azura one minute later than intended. For example, for a race that kicked off at 3pm, Azura was supposed to be started 5 minutes before that (2:55pm) whereas it was actually executed at 2:56pm.

While investigating this, I realised that there was another issue with my data: I had relied on the stream recorder using the market suspend times that were fetched from Betfair to stop recording, but that might not have been the case: if the greyhound race started before the scheduled suspend time, then the recording would stop abruptly, as opposed to at the official suspend time.

Any backtest that counted backwards from the final point in the stream would kind of have access to forward-looking information: knowing that the end of the data is the actual suspend time, not the advertised one.

Hence I had to recover the suspend times that the recorder saw and use those instead. I still had all of the logs that it used, so I could scrape the times from them. But here was another fun thing: spot-checking some suspend times against Betfair revealed that they sometimes also were 1 minute later than the ones on the website.

That meant the forward-looking information issue was a bigger one, since the recorder would have run for longer and have a bigger chance of being interrupted by a race start. It would also be a problem in horse markets: since those can be traded in-play, there could have been periods of in-play trading in my data that could have affected the market-making bot's backtests (in particular, everyone else in the market is affected by the multisecond bet delay which I wasn't simulating).

But more importantly, why were the suspend times different? Was it an issue on Betfair's side? Was something wrong with my code? It was probably the latter. After meditating on more logfiles, I realised that the suspend times seen by Azura were correct whereas the suspend times for Sheogorath for the same markets were 1 minute off. They were making the same request, albeit at different times (Sheogorath would do it when building up a trading schedule, Azura would do it when one of its instances would get started). The only difference was that the former was written in Python and the latter was written in Scala.

After some time of going through my code with a debugger and perusing documentation, I learned a fun fact about timezones.

A fun fact about timezones

I used this bit of code to make sure all times the bot was handing were in UTC:

def parse_dt(dt_str, tz=None):
    return dt.strptime(dt_str, '%Y-%m-%dT%H:%M:%S.%fZ').replace(tzinfo=pytz.timezone(tz) if tz else None)

m_end_dt = parse_dt(m['marketStartTime'], m['event']['timezone'])
m_end_dt = m_end_dt.astimezone(pytz.utc).replace(tzinfo=None)

However, timezones change. Since pytz.timezone doesn't know the time of the timezone its argument refers to, it looks at the earliest definition of the timezone, which in the case of Europe/London is back in mid-1800s. Was the timezone offset back then something reasonable, like an hour? Nope, it was 1 minute.

Here's a fun snippet of code so you can try this at home:

In[4]: 
from datetime import datetime as dt
import pytz
def parse_dt(dt_str, tz=None):
    return dt.strptime(dt_str, '%Y-%m-%dT%H:%M:%S.%fZ').replace(tzinfo=pytz.timezone(tz) if tz else None)
wtf = '2017-09-27T11:04:00.000Z'
parse_dt(wtf)
Out[4]: datetime.datetime(2017, 9, 27, 11, 4)
In[5]: parse_dt(wtf, 'Europe/London')
Out[5]: datetime.datetime(2017, 9, 27, 11, 4, tzinfo=[DstTzInfo 'Europe/London' LMT-1 day, 23:59:00 STD])
parse_dt(wtf, 'Europe/London').astimezone(pytz.utc)
Out[6]: datetime.datetime(2017, 9, 27, 11, 5, tzinfo=[UTC])

And here's an answer from the django-users mailing group on the right way to use timezones:

The right way to attach a pytz timezone to a naive Python datetime is to call tzobj.localize(dt). This gives pytz a chance to say "oh, your datetime is in 2015, so I'll use the offset for Europe/London that's in use in 2015, rather than the one that was in use in the mid-1800s"

Finally, here's some background on how this offset was calculated.

Recovery

Luckily, I knew which exact days in my data were affected by this bug and was able to recover the right suspend times. In fact, I've been lying to you this whole time and all of the plots in this blog series were produced after I had finished the project, with the full and the correct dataset. So the results, actually, weren't affected that much and now I have some more confidence in them.

Conclusion

Next time on project Betfair, we'll teach DAFBot to place orders below £2 and get it to do some real live trading, then moving on to horses.

As usual, posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed. Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here or the beta here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 6

@mildbyte 1 year, 11 months ago | programming | betfair | scala |

Introduction

Previously on project Betfair, we spent some time tuning our market making bot and trying to make it make money in simulation. Today, we'll use coding and algorithms to make (and lose) us some money in real life.

Meet Azura

Scala is amazing. At the very least, it can be written like a better Java (with neat features like allowing multiple classes in one file) and then that can evolve into never having to worry about NullPointerExceptions and the ability to write blog posts about monads.

And strictly typed languages are fun! There's something very empowering about being able to easily refactor code and rename, extract, push and pull methods around, whilst being mostly confident that the compiler is going to stop you from doing stupid things. Unlike, say, Python, where you have to write (and maintain) a massive test suite to make sure every line of your code gets executed, here proper typing can replace a whole class of unit tests.

Hence (and because I kind of wanted to relearn Scala) I decided to write the live trading version of my Betfair market-making bot I called Azura in Scala.

This will mostly be a series of random tales, code snippets, war stories and postmortems describing its development.

Build/deploy process

The actual design of the bot (common between the research Python version and the production Scala version) is described here.

In terms of libraries, I didn't have to use many: the biggest one was probably the JSON parser from the Play! framework. I used SBT (Simple Build Tool ("never have I seen something so misnamed" — coworker, though I didn't have problems with it)) to manage dependencies and build the bot.

I didn't really have a clever deploy procedure: I would log onto the "production" box, pull the code and use

sbt assembly

to create a so-called uber-jar, a Java archive that has all of the project's dependencies packaged inside of it. So executing it with the Java Virtual Machine would be simply a matter of

java -Dazura.dry_run=false -jar target/scala-2.12/azura-assembly-0.1.jar arguments

Disadvantages: this takes up space and possibly duplicates libraries that already exist on the target machine. Advantages: we don't depend on the target machine having the right library versions or the Java classpath set up correctly: the machine only needs to have the right version of the JVM.

Parsing the stream messages

object MarketData {
  def parseMarketChangeMessage(message: JsValue): JsResult[MarketChangeMessage] = message.validate[MarketChangeMessage](marketChangeMessageReads)

    case class RunnerChange(runnerId: Int, backUpdates: Map[Odds, Int],
                            layUpdates: Map[Odds, Int],
                            tradedUpdates: Map[Odds, Int])

    case class MarketChange(marketId: String, runnerChanges: Seq[RunnerChange], isImage: Boolean)

    implicit val oddsReads: Reads[Odds] = JsPath.read[Double].map(Odds.apply)
    implicit val orderBookLineReads: Reads[Map[Odds, Int]] = JsPath.read[Seq[(Odds, Double)]].map(
      _.map { case (p, v) => (p, (v * 100D).toInt) }.toMap)
    implicit val runnerChangeReads: Reads[RunnerChange] = (
      (JsPath \ "id").read[Int] and
      (JsPath \ "atb").readNullable[Map[Odds, Int]].map(_.getOrElse(Map.empty)) and
      (JsPath \ "atl").readNullable[Map[Odds, Int]].map(_.getOrElse(Map.empty)) and
      (JsPath \ "trd").readNullable[Map[Odds, Int]].map(_.getOrElse(Map.empty))) (RunnerChange.apply _)
    implicit val marketChangeReads: Reads[MarketChange] = (
      (JsPath \ "id").read[String] and
      (JsPath \ "rc").read[Seq[RunnerChange]] and
      (JsPath \ "img").readNullable[Boolean].map(_.getOrElse(false))) (MarketChange.apply _)
    implicit val marketChangeMessageReads: Reads[MarketChangeMessage] = (
      (JsPath \ "pt").read[Long].map(new Timestamp(_)) and
      (JsPath \ "mc").readNullable[Seq[MarketChange]].map(_.getOrElse(Seq.empty))) (MarketChangeMessage.apply _)
    case class MarketChangeMessage(timestamp: Timestamp, marketChanges: Seq[MarketChange])
  }

This might take some time to decipher. The problem was converting a JSON-formatted order book message into a data structure used inside the bot to represent the order book — and there, of course, can be lots of edge cases. What if the message isn't valid JSON? What if the structure of the JSON isn't exactly what we expected it to be (e.g. a subentry is missing)? What if the type of something isn't what we expected it to be (instead of a UNIX timestamp we get an ISO-formatted datetime)?

The Play! JSON module kind of helps us solve this by providing a DSL that allows to specify the expected structure of the JSON object by combining smaller building blocks. For example, marketChangeReads shows how to parse a message containing changes to the whole market order book (MarketChangeMessage). We first need to read a string containing the market ID at "id", then a sequence of changes to each runner (RunnerChange) located at "rc" and then a Boolean value at "img" that says whether it's a full image (as in we need to flush our cache and replace it with this message) or not.

To read a RunnerChange (runnerChangeReads), we need to read an integer containing the runner's ID and the changes to its available to back, lay and traded odds. To read those changes (orderBookLineReads), we want to parse a sequence of tuples of odds and doubles, convert the doubles (represending pound amounts available to bet or traded) into integer penny values and finally turn that into a map.

Finally, parsing the odds is simply a matter of creating an Odds object from the double value representing the odds (which rounds the odds to the neearest ones allowed by Betfair).

Core of the Scala CAFBotV2

  override def getTargetMarket(orderBook: OrderBook, runnerOrders: RunnerOrders): (Map[Odds, Int], Map[Odds, Int]) = {
    val logger: Logger = Logger[CAFStrategyV2]

    val exposure = OrderBookUtils.getPlayerExposure(runnerOrders).net
    val exposureFraction = exposure / offeredExposure.toDouble
    logger.info(s"Net exposure $exposure, fraction of offer: $exposureFraction")

    if (orderBook.availableToBack.isEmpty || orderBook.availableToLay.isEmpty) {
      logger.warn("One half of the book is empty, doing nothing")
      return (Map.empty, Map.empty)
    }

    logger.info(s"Best back: ${orderBook.bestBack()}, Best lay: ${orderBook.bestLay()}")
    val imbalance = (orderBook.bestBack()._2 - orderBook.bestLay()._2) / (orderBook.bestBack()._2 + orderBook.bestLay()._2).toDouble
    logger.info(s"Order book imbalance: $imbalance")

    // Offer a back -- i.e. us laying
    val backs: Map[Odds, Int] = if (exposureFraction >= -2) {
      val start = Odds.toTick(orderBook.bestBack()._1) +
        (if (exposureFraction >= -exposureThreshold && imbalance >= -imbalanceThreshold) 0 else -1) - levelBase
      (start until (start - levels) by -1).map { p: Int => Odds.fromTick(p) -> (offeredExposure / Odds.fromTick(p).odds).toInt }.toMap
    } else Map.empty

    val lays: Map[Odds, Int] = if (exposureFraction <= 2) {
      val start = Odds.toTick(orderBook.bestLay()._1) +
        (if (exposureFraction <= exposureThreshold && imbalance <= imbalanceThreshold) 0 else 1) + levelBase
      (start until start + levels).map { p: Int => Odds.fromTick(p) -> (offeredExposure / Odds.fromTick(p).odds).toInt }.toMap
    } else Map.empty

    (backs, lays)
  }

The core is actually fairly simple and similar to the research version of CAFBotV2. Here, exposure really means inventory, as in amount of one-pound contracts the bot is holding. Instead of absolute inventory values for its limits (like 30 contracts in the previous post), the bot operates with fractions of amount of contracts it offers over its position (say, the previous limit of 30 contracts would have been represented here as 30/10 = 3).

After calculating the fraction and the order book imbalance, the bot calculates the back and lay bets it wishes to maintain in the book: first, the tick number it wishes to start on (the best back/lay or one level below/above if the order book imbalance or inventory is too negative/positive) and then each odds level counting down/up from that point. Finally, it divides the amount of contracts it wishes to offer by the actual odds in order to output the bet (in pounds) it wishes the be maintained at each price level.

There's also a custom levelBase parameter that allows to control how far from the best back/lay the market is made: with a levelBase of zero, the bot would place its bets at the best back/lay, with levelBase of one the bets would be 1 tick below/above best back/lay etc.

First messages on the stream and a dummy run

I sadly do not any more have the old logs back from when I managed to receive (and parse!) the first few messages from the Stream API, so here's a dramatic reconstruction.

20:46:58.093 [main] INFO main - Starting Azura...
20:46:58.096 [main] INFO main - Getting market suspend time...
20:46:58.648 [main] INFO main - {"filter":{"marketIds":["1.134156568"]},"marketProjection":["MARKET_START_TIME"],"maxResults":1}
20:46:59.793 [main] INFO main - [{"marketId":"1.134156568","marketName":"R9 6f Claim","marketStartTime":"2017-09-13T21:15:00.000Z","totalMatched":0.0}]
20:46:59.966 [main] INFO main - 2017-09-13T21:15
20:46:59.966 [main] INFO main - Initializing the subscription...
20:47:00.056 [main] INFO streaming.SubscriptionManager - {"op":"connection","connectionId":"100-130917204700-67655"}
20:47:00.077 [main] INFO streaming.SubscriptionManager - {"op":"status","id":1,"statusCode":"SUCCESS","connectionClosed":false}
20:47:00.081 [main] INFO main - Subscribing to the streams...
20:47:00.119 [main] INFO streaming.SubscriptionManager - {
  "op" : "status",
  "id" : 1,
  "statusCode" : "SUCCESS",
  "connectionClosed" : false
}
20:47:00.160 [main] INFO streaming.SubscriptionManager - {
  "op" : "status",
  "id" : 2,
  "statusCode" : "SUCCESS",
  "connectionClosed" : false
}
20:47:00.207 [main] INFO streaming.SubscriptionManager - {
  "op" : "mcm",
  "id" : 1,
  "initialClk" : "yxmUlZaPE8sZsOergBPLGYz/ipIT",
  "clk" : "AAAAAAAA",
  "conflateMs" : 0,
  "heartbeatMs" : 5000,
  "pt" : 1505335620115,
  "ct" : "SUB_IMAGE",
  "mc" : [ {
    "id" : "1.134156568",
    "rc" : [ {
      "atb" : [ [ 5.1, 2.5 ], [ 1.43, 250 ], [ 1.01, 728.58 ], [ 1.03, 460 ], [ 5, 20 ], [ 1.02, 500 ], [ 1.13, 400 ] ],
      "atl" : [ [ 24, 0.74 ], [ 25, 3.55 ], [ 26, 1.96 ], [ 29, 4.2 ], [ 900, 0.02 ] ],
      "trd" : [ [ 5.3, 3.96 ], [ 5.1, 6.03 ] ],
      "id" : 13531617
    }, ... ],
    "img" : true
  } ]
}
20:47:00.322 [main] INFO main - Initializing harness for runner 13531618
20:47:00.326 [main] INFO streaming.SubscriptionManager - {
  "op" : "ocm",
  "id" : 2,
  "initialClk" : "VeLnh9sEsAa1t/neBHeqxJ3aBG7YuYDbBNYGuqHD1AQ=",
  "clk" : "AAAAAAAAAAAAAA==",
  "conflateMs" : 0,
  "heartbeatMs" : 5000,
  "pt" : 1505335620130,
  "ct" : "SUB_IMAGE"
}
20:47:00.341 [main] INFO strategy.StrategyHarness - Exposures (GBX): Back 0, Lay 0, cash 0
20:47:00.342 [main] INFO strategy.StrategyHarness - Net exposure: 0
20:47:00.343 [main] INFO strategy.CAFStrategyV2 - Net exposure 0, fraction of offer: 0.0
20:47:00.343 [main] INFO strategy.CAFStrategyV2 - Best back: (Odds(3.45),198), Best lay: (Odds(6.6),472)
20:47:00.344 [main] INFO strategy.CAFStrategyV2 - Order book imbalance: -0.408955223880597
20:47:00.351 [main] INFO strategy.StrategyHarness - Strategy wants atb, atl (Map(Odds(3.3) -> 454, Odds(3.25) -> 461),Map(Odds(7.2) -> 208, Odds(7.4) -> 202)), current outstanding orders are back, lay (Map(),Map())
20:47:00.952 [main] INFO streaming.SubscriptionManager - {
  "op" : "mcm",
  "id" : 1,
  "clk" : "ADQAFQAI",
  "pt" : 1505335620944,
  "mc" : [ {
    "id" : "1.134156568",
    "rc" : [ {
      "atl" : [ [ 990, 0 ] ],
      "id" : 13531618
    } ]
  } ]
}
20:47:00.955 [main] INFO strategy.StrategyHarness - Exposures (GBX): Back 0, Lay 0, cash 0
20:47:00.955 [main] INFO strategy.StrategyHarness - Net exposure: 0
20:47:00.955 [main] INFO strategy.CAFStrategyV2 - Net exposure 0, fraction of offer: 0.0
20:47:00.955 [main] INFO strategy.CAFStrategyV2 - Best back: (Odds(3.45),198), Best lay: (Odds(6.6),472)
20:47:00.956 [main] INFO strategy.CAFStrategyV2 - Order book imbalance: -0.408955223880597
20:47:00.956 [main] INFO strategy.StrategyHarness - Strategy wants atb, atl (Map(Odds(3.3) -> 454, Odds(3.25) -> 461),Map(Odds(7.2) -> 208, Odds(7.4) -> 202)),...

Essentially, at the start the bot subscribes to the data stream for a given market and its own order status stream. At 20:47:00.207 it receives the first message from the market data stream: the initial order book image for all runners in that market.

Before subscribing to the streams, though, the bot also gets the market metadata to find out when the race actually begins. If it receives a message and it's timestamped less than 5 minutes away from the race start, it starts market making (well, pretending to, since at that point I hadn't implemented bet placement) on the favourite at that time by polling the strategy core and just returning the bets it wants to maintain on both sides of the order book.

Every time there's an update on the market book stream, the strategy is polled again to make sure it still wants the same bets to be maintained. If there's a change, then the harness performs cancellations/placements as needed.

Bet placement: order and book stream synchronization

The first problem I noticed when trying to implement bet placement was that the order book, order status streams and the API-NG aren't synchronized. In essence, if a bet is placed via API-NG, it takes a while for it to appear on the order status stream (showing whether or not it has been executed) or be reflected in the market order book. Since the core of the bot is supposed to be stateless, this could have caused this sort of an issue:

  • the core wants to offer £2 to back at 2.00
  • the data on the order status stream implies we don't have any outstanding bets
  • hence a bet of £2 at 2.00 is submitted via a REST call (that returns a successful response containing the bet ID and some other metadata
  • a new update to the order status stream arrives that doesn't yet contain the new bet
  • the core is polled again, it still wants to offer £2 at 2.00
  • since we haven't seen our bet on the status stream, we submit another bet
  • we now have 2 unmatched bets

This could get ugly quickly and there were several ways to solve it. One could be applying the bet that had just been submitted to a local copy of the order status stream and then ignoring the real Betfair message containing that bet when it does appear on the stream, but that would mean needing to reconcile our local cache with the Betfair stream: what if the bet doesn't appear or turns out to have been invalidated? I went with a simpler approach: get the bet ID returned by the REST API when a bet is placed and then stop polling the bot and placing bets until a bet with that ID is seen on the order status stream, since only then the bot would be in a consistent state.

The resultant timings are not HFT-like at all. From looking at the logs, a normal sequence is something like:

  • T: executor submits a bet (HTTP REST request)
  • T+120ms: receive a REST response with the bet ID
  • T+190ms: timestamp of the bet according to Betfair
  • T+200ms: bet appears on the order status stream (timestamped 10ms ago)

This could be partially mitigated by doing order submissions in a separate thread (in essence applying the order to our outstanding orders cache even before we receive a message from Betfair with its bet ID) but there would still be the issue of Betfair apparently taking about 190ms to put the bet into its order book. And I didn't want to bother for now, since I just wanted to get the bot into a shape where it could place live bets.

Live trading!

I was now ready to unleash the bot onto some real markets. I chose greyhound racing and, for safety, decided to do a test with making a market several ticks away from the current best lay/back, the reasoning being that while it would test the whole system end-to-end (in terms of maintaining outstanding orders), these bets would have little chance of getting matched and even if they did, it would be far enough from the current market for me to have time to quickly liquidate them manually and not lose money.

Attempt 0

Well, before that it had some very embarrassing false starts. Remember how I operated with penny amounts throughout the codebase to avoid floating point rounding errors? I completely forgot to change pennies to pounds again when submitting the bets, which meant that instead of a £2 bet I tried to submit a £200 bet. Luckily, I didn't have that much in my account so I just got lots of INSUFFICIENT_FUNDS errors.

Speaking of funding the account, Betfair doesn't do margin calls and requires all bets to be fully funded: so say for a lay of £2 at 1.5 you need to have at least (1.5 * £2) - £2 = £1 and for a back of £2 at 1.5 you need to have £2. Backs and lays can be offset against each other: so if we've backed a runner for £2 and now have £0 available to bet, we can still "green up" by placing a lay (as long as it doesn't make us short the runner).

For unmatched bets, it gets slightly more weird: if there's both an unmatched back and a lay in the order book, the amount available to bet is reduced by the maximum of the liabilities of the two: since either one of them can be matched, Betfair takes the worst-case scenario.

Attempt 1

So I started the bot in actual live mode, with real money (it would place 15 contracts on both sides, resulting in bets of about £5 at odds 3.0) and, well, the placement actually worked! On the Betfair website I saw several unmatched bets attributed to me which would change as the market moved.

20:47:01.059 [main] INFO strategy.StrategyHarness - Exposures (GBX): Back 0, Lay 0, cash 0
20:47:01.059 [main] INFO strategy.StrategyHarness - Net exposure: 0
20:47:01.059 [main] INFO strategy.CAFStrategyV2 - Net exposure 0, fraction of offer: 0.0
20:47:01.060 [main] INFO strategy.CAFStrategyV2 - Best back: (Odds(3.45),198), Best lay: (Odds(6.4),170)
20:47:01.060 [main] INFO strategy.CAFStrategyV2 - Order book imbalance: 0.07608695652173914
20:47:01.061 [main] INFO strategy.StrategyHarness - Strategy wants atb, atl (Map(Odds(3.3) -> 454, Odds(3.25) -> 461),Map(Odds(7.0) -> 214, Odds(7.2) -> 208)), current outstanding orders are back, lay (Map(Odds(7.4) -> 202, Odds(7.2) -> 208),Map(Odds(3.25) -> 461, Odds(3.3) -> 454))
20:47:01.068 [main] INFO execution.ExecutionUtils - Trying to submit 1 placements and 1 cancellations
20:47:01.069 [main] INFO execution.ExecutionUtils - Available to bet according to us: 97919
20:47:01.073 [main] INFO execution.ExecutionUtils - Submitting cancellations: {"marketId":"1.134156568","instructions":[{"betId":102464716465,"sizeReduction":null}]}
20:47:01.191 [main] INFO execution.ExecutionUtils - Result: 200
20:47:01.192 [main] INFO execution.ExecutionUtils - {
  "status" : "SUCCESS",
  "marketId" : "1.134156568",
  "instructionReports" : [ {
    "status" : "SUCCESS",
    "instruction" : {
      "betId" : "102464716465"
    },
    "sizeCancelled" : 2.02,
    "cancelledDate" : "2017-09-13T20:47:01.000Z"
  } ]
}
20:47:01.192 [main] INFO execution.ExecutionUtils - Submitting placements: {"marketId":"1.134156568","instructions":[{"orderType":"LIMIT","selectionId":13531618,"side":"BACK","limitOrder":{"size":2.14,"price":7,"persistenceType":"LAPSE"}}]}
20:47:01.308 [main] INFO execution.ExecutionUtils - Result: 200
20:47:01.309 [main] INFO execution.ExecutionUtils - {
  "status" : "SUCCESS",
  "marketId" : "1.134156568",
  "instructionReports" : [ {
    "status" : "SUCCESS",
    "instruction" : {
      "selectionId" : 13531618,
      "limitOrder" : {
        "size" : 2.14,
        "price" : 7,
        "persistenceType" : "LAPSE"
      },
      "orderType" : "LIMIT",
      "side" : "BACK"
    },
    "betId" : "102464717112",
    "placedDate" : "2017-09-13T20:47:01.000Z",
    "averagePriceMatched" : 0,
    "sizeMatched" : 0,
    "orderStatus" : "EXECUTABLE"
  } ]
}
20:47:01.310 [main] INFO streaming.SubscriptionManager - {
  "op" : "mcm",
  "id" : 1,
  "clk" : "AEUAIgAR",
  "pt" : 1505335621264,
  "mc" : [ {
    "id" : "1.134156568",
    "rc" : [ {
      "atl" : [ [ 7.4, 0 ] ],
      "id" : 13531618
    } ]
  } ]
}
20:47:01.311 [main] INFO main - Some bet IDs unseen by the order status cache, not doing anything...
20:47:01.313 [main] INFO streaming.SubscriptionManager - {
  "op" : "ocm",
  "id" : 2,
  "clk" : "ACEAFgAJABgACw==",
  "pt" : 1505335621269,
  "oc" : [ {
    "id" : "1.134156568",
    "orc" : [ {
      "id" : 13531618,
      "uo" : [ {
        "id" : "102464716465",
        "p" : 7.4,
        "s" : 2.02,
        "side" : "B",
        "status" : "EC",
        "pt" : "L",
        "ot" : "L",
        "pd" : 1505335620000,
        "sm" : 0,
        "sr" : 0,
        "sl" : 0,
        "sc" : 2.02,
        "sv" : 0,
        "rac" : "",
        "rc" : "REG_GGC",
        "rfo" : "",
        "rfs" : ""
      } ]
    } ]
  } ]
}
20:47:01.314 [main] INFO main - Some bet IDs unseen by the order status cache, not doing anything...
20:47:01.376 [main] INFO streaming.SubscriptionManager - {
  "op" : "mcm",
  "id" : 1,
  "clk" : "AEsAIwAR",
  "pt" : 1505335621369,
  "mc" : [ {
    "id" : "1.134156568",
    "rc" : [ {
      "atl" : [ [ 7, 2.14 ] ],
      "id" : 13531618
    } ]
  } ]
}
20:47:01.378 [main] INFO main - Some bet IDs unseen by the order status cache, not doing anything...
20:47:01.380 [main] INFO streaming.SubscriptionManager - {
  "op" : "ocm",
  "id" : 2,
  "clk" : "ACMAFwALABoADA==",
  "pt" : 1505335621372,
  "oc" : [ {
    "id" : "1.134156568",
    "orc" : [ {
      "id" : 13531618,
      "uo" : [ {
        "id" : "102464717112",
        "p" : 7,
        "s" : 2.14,
        "side" : "B",
        "status" : "E",
        "pt" : "L",
        "ot" : "L",
        "pd" : 1505335621000,
        "sm" : 0,
        "sr" : 2.14,
        "sl" : 0,
        "sc" : 0,
        "sv" : 0,
        "rac" : "",
        "rc" : "REG_GGC",
        "rfo" : "",
        "rfs" : ""
      } ]
    } ]
  } ]
}
20:47:01.382 [main] INFO strategy.StrategyHarness - Exposures (GBX): Back 0, Lay 0, cash 0
20:47:01.382 [main] INFO strategy.StrategyHarness - Net exposure: 0
20:47:01.382 [main] INFO strategy.CAFStrategyV2 - Net exposure 0, fraction of offer: 0.0

So as intended:

  • the market moved to 3.45 / 6.40 best back/lay
  • the bot had backs at 7.4 and 7.2 and lays at 3.25 and 3.3, but wanted backs at 7.2 and 7.0
  • so it cancelled the back at 7.4 and then placed one at 7.0...
  • ...and then waited until the cancelled bet and the new bet appeared on the order status stream (20:47:01.313 and 20:47:01.380, respectively) before proceeding.

And indeed, the bot managed to maintain its bets far enough from the action in order to not get matched.

I ran it again, on a different market:

20:54:19.217 [main] INFO strategy.CAFStrategyV2 - Best back: (Odds(3.15),3474), Best lay: (Odds(3.2),18713)
20:54:19.217 [main] INFO strategy.CAFStrategyV2 - Order book imbalance: -0.6868436471807815
20:54:19.218 [main] INFO strategy.StrategyHarness - Strategy wants atb, atl (Map(Odds(2.98) -> 503, Odds(2.96) -> 506),Map(Odds(3.35) -> 447, Odds(3.4) -> 441)), current outstanding orders are back, lay (Map(Odds(3.6) -> 416, Odds(3.65) -> 409),Map(Odds(3.25) -> 461, Odds(3.2) -> 468))
20:54:19.219 [main] INFO execution.ExecutionUtils - Trying to submit 4 placements and 4 cancellations
20:54:19.219 [main] INFO execution.ExecutionUtils - Available to bet according to us: 97934
20:54:19.220 [main] INFO execution.ExecutionUtils - Submitting cancellations: {"marketId":"1.134151928","instructions":[{"betId":102464926664,"sizeReduction":null},{"betId":102464926665,"sizeReduction":null},{"betId":102464926548,"sizeReduction":null},{"betId":102464926666,"sizeReduction":null}]}
20:54:19.324 [main] INFO execution.ExecutionUtils - Result: 200
20:54:19.324 [main] INFO execution.ExecutionUtils - {
  "status" : "PROCESSED_WITH_ERRORS",
  "errorCode" : "PROCESSED_WITH_ERRORS",
  "marketId" : "1.134151928",
  "instructionReports" : [ {
    "status" : "SUCCESS",
    "instruction" : {
      "betId" : "102464926664"
    },
    "sizeCancelled" : 4.16,
    "cancelledDate" : "2017-09-13T20:54:19.000Z"
  }, {
    "status" : "SUCCESS",
    "instruction" : {
      "betId" : "102464926665"
    },
    "sizeCancelled" : 4.1,
    "cancelledDate" : "2017-09-13T20:54:19.000Z"
  }, {
    "status" : "FAILURE",
    "errorCode" : "BET_TAKEN_OR_LAPSED",
    "instruction" : {
      "betId" : "102464926548"
    }
  }, {
    "status" : "FAILURE",
    "errorCode" : "BET_TAKEN_OR_LAPSED",
    "instruction" : {
      "betId" : "102464926666"
    }
  } ]
}
20:54:19.325 [main] ERROR execution.ExecutionUtils - Cancellation unsuccessful. Aborting.
20:54:19.325 [main] ERROR main - Execution failure.

Oh boy. I quickly alt-tabbed to the Betfair web interface, and yep, I had an outstanding bet that had been matched and wasn't closed. I managed to close my position manually (by submitting some offsetting bets on the other side) and in fact managed to make about £1 from the trade, but what on earth happened here?

Looking at the logs, it seems like the bot wanted to move its bets once again: the market suddenly dropped from best back/lay 3.4/3.45 down to 3.15/3.2, whereas the bot had bets at 3.2, 3.25/3.6, 3.65. So the bot needed to cancel all 4 of its outstanding bets and move them down as well.

But wait: how come the best back was at 3.15 and the bot had a lay bet (meaning that bet was available to back) at 3.2? Why wasn't an offer to back at higher odds (3.2 vs 3.15) at the top of the book instead?

In fact, those two bets had been matched and that hadn't yet been reflected on the order status feed. So when the bot tried to cancel all of its bets, two cancellations failed because the bets had already been matched. The odds soon recovered back and I managed to submit a back bet at higher odds than the bot had laid (which ended up in a profit), but the order status and the order book stream being desynchronized was a bigger problem.

I decided to just ignore errors where cancellations were failing: eventually the bot would receive an update saying that the bet got matched and would stop trying to cancel it.

Attempt 2

How did we get here?

10:08:52.623 [main] INFO strategy.StrategyHarness - Exposures (GBX): Back 4599, Lay 0, cash -1501
10:08:52.623 [main] INFO strategy.StrategyHarness - Net exposure: 4599
10:08:52.623 [main] INFO strategy.CAFStrategyV2 - Net exposure 4599, fraction of offer: 3.066
10:08:52.623 [main] INFO strategy.CAFStrategyV2 - Best back: (Odds(2.74),1000), Best lay: (Odds(2.8),1021)
10:08:52.623 [main] INFO strategy.CAFStrategyV2 - Order book imbalance: -0.010390895596239486
10:08:52.624 [main] INFO strategy.StrategyHarness - Strategy wants atb, atl (Map(Odds(2.68) -> 559, Odds(2.66) -> 563),Map(Odds(2.88) -> 520, Odds(2.9) -> 517)), current outstanding orders are back, lay (Map(),Map(Odds(2.66) -> 563, Odds(2.68) -> 559))
10:08:52.624 [main] INFO execution.ExecutionUtils - Trying to submit 2 placements and 0 cancellations
10:08:52.624 [main] INFO execution.ExecutionUtils - Available to bet according to us: 96626
10:08:52.624 [main] INFO execution.ExecutionUtils - Submitting placements: {"marketId":"1.134190324","instructions":[{"orderType":"LIMIT","selectionId":12743977,"side":"BACK","limitOrder":{"size":5.2,"price":2.88,"persistenceType":"LAPSE"}},{"orderType":"LIMIT","selectionId":12743977,"side":"BACK","limitOrder":{"size":5.17,"price":2.9,"persistenceType":"LAPSE"}}]}
10:08:52.769 [main] INFO execution.ExecutionUtils - Result: 200
10:08:52.769 [main] INFO execution.ExecutionUtils - {
  "status" : "SUCCESS",
  "marketId" : "1.134190324",
  "instructionReports" : [ {
    "status" : "SUCCESS",
    "instruction" : {
      "selectionId" : 12743977,
      "limitOrder" : {
        "size" : 5.2,
        "price" : 2.88,
        "persistenceType" : "LAPSE"
      },
      "orderType" : "LIMIT",
      "side" : "BACK"
    },
    "betId" : "102489084350",
    "placedDate" : "2017-09-14T10:08:52.000Z",
    "averagePriceMatched" : 2.9,
    "sizeMatched" : 5.2,
    "orderStatus" : "EXECUTION_COMPLETE"
  }, {
    "status" : "SUCCESS",
    "instruction" : {
      "selectionId" : 12743977,
      "limitOrder" : {
        "size" : 5.17,
        "price" : 2.9,
        "persistenceType" : "LAPSE"
      },
      "orderType" : "LIMIT",
      "side" : "BACK"
    },
    "betId" : "102489084351",
    "placedDate" : "2017-09-14T10:08:52.000Z",
    "averagePriceMatched" : 2.9,
    "sizeMatched" : 5.17,
    "orderStatus" : "EXECUTION_COMPLETE"
  } ]
}
...
10:08:52.789 [main] INFO strategy.StrategyHarness - Exposures (GBX): Back 7606, Lay 0, cash -2538
10:08:52.789 [main] INFO strategy.StrategyHarness - Net exposure: 7606
10:08:52.789 [main] INFO strategy.StrategyHarness - Max exposure reached, liquidating
10:08:52.789 [main] INFO strategy.StrategyHarness - Strategy wants atb, atl (Map(Odds(2.96) -> 7606),Map()), current outstanding orders are back, lay (Map(),Map(Odds(2.66) -> 563, Odds(2.68) -> 559))
10:08:52.790 [main] INFO execution.ExecutionUtils - Trying to submit 1 placements and 2 cancellations
10:08:52.790 [main] INFO execution.ExecutionUtils - Available to bet according to us: 95589
10:08:52.790 [main] INFO execution.ExecutionUtils - Submitting cancellations: {"marketId":"1.134190324","instructions":[{"betId":102489083069,"sizeReduction":null},{"betId":102489083236,"sizeReduction":null}]}
10:08:52.873 [main] INFO execution.ExecutionUtils - Result: 200
10:08:52.873 [main] INFO execution.ExecutionUtils - {
  "status" : "SUCCESS",
  "marketId" : "1.134190324",
  "instructionReports" : [ {
    "status" : "SUCCESS",
    "instruction" : {
      "betId" : "102489083069"
    },
    "sizeCancelled" : 5.63,
    "cancelledDate" : "2017-09-14T10:08:52.000Z"
  }, {
    "status" : "SUCCESS",
    "instruction" : {
      "betId" : "102489083236"
    },
    "sizeCancelled" : 5.59,
    "cancelledDate" : "2017-09-14T10:08:52.000Z"
  } ]
}
10:08:52.873 [main] INFO execution.ExecutionUtils - Submitting placements: {"marketId":"1.134190324","instructions":[{"orderType":"LIMIT","selectionId":12743977,"side":"LAY","limitOrder":{"size":76.06,"price":2.96,"persistenceType":"LAPSE"}}]}
10:08:52.895 [main] INFO execution.ExecutionUtils - Result: 200
10:08:52.895 [main] INFO execution.ExecutionUtils - {
  "status" : "FAILURE",
  "errorCode" : "INSUFFICIENT_FUNDS",
  "marketId" : "1.134190324",
  "instructionReports" : [ {
    "status" : "FAILURE",
    "errorCode" : "ERROR_IN_ORDER",
    "instruction" : {
      "selectionId" : 12743977,
      "limitOrder" : {
        "size" : 76.06,
        "price" : 2.96,
        "persistenceType" : "LAPSE"
      },
      "orderType" : "LIMIT",
      "side" : "LAY"
    }
  } ]
}
10:08:52.895 [main] ERROR execution.ExecutionUtils - Placement unsuccessful. Aborting.

This led to another scramble with me frantically trying to close the position. In the end, I managed to make about £6 of profit from that market (accidentally going short just before it kicked off with that runner losing in the end).

As you'll find out later, this will be the most money that project Betfair will make.

So what happened here? First, the bot's back bets were being matched disproportionately often: at 10:08:52.623 it held 45.99 contracts. It was still below its maximum exposure level, so it placed a couple more back bets far away from the market at 10:08:52.769. Those immediately got matched (see "orderStatus" : "EXECUTION_COMPLETE" in the REST response), bringing the bot's exposure to above 75 contracts, so at 10:08:52.789 it decided to completely liquidate its position.

What happened next was dumb: instead of placing a lay bet of £76.06 / 2.96 (number of contracts divided by the odds), it placed a bet of £76.06. This would have made the bot have a massive short on the runner, but it didn't have enough money to do so, so instead Betfair came back with an INSUFFICIENT_FUNDS error.

Interestingly enough, if the bot did manage to close its position, it would have made money (but less): the Back 7606, Lay 0, cash -2538 part basically says it had paid £25.38 for 76.06 contracts, implying average odds of 3.00, so with a lower lay of 2.96 it would get £25.69 back, leaving it with a profit of £0.31.

Attempts 3, 4, 5...

After fixing those bugs I encountered a few more: for example, since the bot couldn't place orders below £2, it would sometimes end up with a small residual exposure at the end of trading which it couldn't get rid of. There were other issues, for example, the bot wasn't incorporating instant matches (as in when the bet placement REST request came back saying the bet had gotten immediately matched). After a couple more runs I managed to accidentally lose most of the £7 I had accidentally made and decided to stop there for now.

What I was also interested in was the fact that despite my assumptions, offers as far as 3 Betfair ticks away from the market would get matched. This was partially because the bot was slower than when I simulated it and partially because matching in greyhound markets indeed happened in jumps, but that gave me a different idea: what if I did actually simulate making a market further away from the best bid/offer?

And later on, I would come across a different and simpler idea that would mean market making would get put on hold.

Conclusion

Next time on project Betfair, we'll tinker with our simulation some more and then start looking at our data from a different perspective.

As usual, posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed. Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 5

@mildbyte 1 year, 11 months ago | programming | python | betfair |

Introduction

Previously on project Betfair, we started fixing our market-making bot so that it wouldn't accumulate as much inventory. Today, we'll try to battle the other foe of a market maker: adverse selection.

Order book imbalance

Consider the microprice from part 3: the average of the best back price and the best lay price, weighted by the volume on both sides. We had noticed that sometimes a move in the best back/lay can be anticipated by the microprice getting close to one or the other.

Let's quantify this somehow. Let's take the order book imbalance indicator, showing how close this microprice is to either the best back or the best lay: $$\frac{\text{BBVolume} - \text{BLVolume}}{\text{BBVolume} + \text{BLVolume}}$$ Can this predict price movements?

Oh yes it can. This graph plots the average move in the best back/lay quotes at the next tick (as in the next Betfair message on the stream), conditioned on the cumulative order book imbalance. In other words, the blue circles/crosses show the average move in the best lay/back quote, assuming the order book imbalance is above the given value, and the red markers show the same, but for order book imbalances below the given value.

For example, at order book imbalance values above 0.5 the average move in the best back/lay quotes in the next message is about 0.1 Betfair tick (this time I mean a minimum price move, like 1.72 to 1.73) and for order book imbalance values below -0.5 the average move in the best back/lay quotes is about -0.1 Betfair tick.

Essentially, at high negative or positive order book imbalance values we can say that there will be an imminent price move. There are several intuitive explanations to this phenomenon. For example, we can say that if aggressive trades happen randomly against both sides, the side with less volume offered will get exhausted earlier, hence the price will naturally move towards that side. In addition, if offers on a given side represent an intention to back (or lay), the participants on that side might soon get impatient and will cross the spread in order to get executed faster, the side with more available volume thus winning and pushing the price away from itself.

This effect is quite well documented in equity and futures markets and is often used for better execution: while it's not able to predict large-scale moves, it can help an executing algorithm decide when to cross the spread once we've decided what position we wish to take. For example, see this presentation from BAML — and on page 6 it even has a very similar plot to this one!

CAFBot with order book imbalance detection

This is a very useful observation, since that means the market maker can anticipate prices moving and adjust its behaviour accordingly. Let's assume that we're making a market at 1.95 best back / 1.96 best lay and the odds will soon move to 1.94 best back / 1.95 best lay. We can prepare for this by cancelling our lay (that shows in the book as being available to back) at 1.95 and moving it to 1.94: otherwise it would have been executed at 1.95 shortly before the price move and we would have immediately lost money.

So I added another feature to CAFBot: when the order book imbalance is above a given threshold (positive or negative), it would move that side of the market it was making by one tick. So, for example, let's again say that the current best back/lay are 1.94/1.95 and the order book imbalance threshold is set to be 0.5. Then:

  • Order book imbalance < -0.5: large pressure from the lay side, make a market at back/lay 1.93/1.95
  • Order book imbalance between -0.5 and 0.5: business as usual, make a market at back/lay 1.94/1.95
  • Order book imbalance > 0.5: large pressure from the back side, make a market at back/lay 1.94/1.96

For now, I didn't use any of the inventory management methods I had described in the previous part (though there are some interesting ways they can interact with this: for example, could we ever not move our quotes at high imbalance values because an imminent price move and hence trades against us could help us close our position?). I did, however, keep the make-market-at-3-price levels feature.

Let's see how it did on our guinea pig market.

Results

So... it made money, but in a bad way. Having no inventory control made the bot accumulate an enormous negative exposure during the whole trading period: its performance only got saved by an upwards swing in odds during the last few seconds before it closed its position. In fact a 6-tick swing brought its PnL up £10 from -£6 to £4. Not very healthy, since we don't want to rely on general (and random) price moves: it as well could have stopped trading before that price swing and lost money. On the other hand, there's still the interesting fact of it having made money by generally being short in a market whose odds trended downwards (3.9 to 3.5), as in against its position.

Good news: the order book imbalance indicator kind of worked: here the same segment from part 3 is plotted, together with the bot's orders that got executed. You can see that where the old version would have had the price go through it, the new version sometimes anticipates that and moves its quotes away. In addition, look at the part shortly after 12:10 where the price oscillates between 3.65 and 3.55: since the microprice after the price move is still close to the previous level, the bot doesn't place any orders at 3.6.

However, the fact that we don't have inventory control in this version hurts the bot immensely:

Look at that standard deviation: it's larger than that of the first naive version (£3.55)!

CAFBotV2: a new hope

Let's put the insights from this and the previous part together and see if we can manage to make the bot not lose money. I combined the mitigation of inventory risk and adverse selection as follows: move the quote on one side away by one tick if either the order book imbalance is high enough or our exposure (inventory) is high enough. Effectively, if the bot would have moved its lay bet lower by 1 tick because of high negative order book imbalance (odds are about to move down) as well as because of high negative exposure (so the bot doesn't want to bet against even more), it would only move the lay bet by 1 tick, not 2.

On the other hand, let's assume there's a high negative order book imbalance but the bot has a large positive exposure. Should the bot still move its lay quote given that that quote getting hit would mean the bot would sell off a bit of its inventory? I reasoned that if the odds were about to move down, that would be good for the bot's profit (since odds going down is the same as implied probability going up, so being long benefits the bot) and in fact would allow it to offload its exposure at even lower odds later on.

So with that in mind, let's see how the brand new CAFBot does on our example horse racing market.

Look at it go. Basically a straight upwards line during the last 15 minutes and all of this while juggling its exposure back and forth like a champ. Beautiful. Let's take a look at the whole dataset.

Damn. At least it's losing less money than the version with moving the bot's quotes at high inventory values: in fact, twice as little (-£0.58 vs -£1.04).

Looking closer at what happened, looks like in some markets our fancy inventory control didn't work.

What happened here was that while the bot had a large long position and wasn't placing bets at the current best available lay, those bets were still matching as the prices would sometimes jump more than one tick at a time. The odds were violently trending upwards and so there weren't as many chances for the bot to close out its position.

How about if we get the bot to stop trading at all on one side of the book if its position breaches a certain level? While this makes the PnL distribution less skewed and slightly less volatile, it doesn't improve the mean much.

Greyhounds

Meanwhile in the greyhound racing market, things weren't going well either.

Examining the PnL closer

Back to horses again, is there some way we can predict the money the bot will make/lose in order to see if some markets are not worth trying to trade at all?

First of all, it doesn't seem like the PnL depends on the time of day we're trading.

The dataset is mostly UK and US horse races and the time is UTC. The UK races start at about 11am and end at about 7pm, whereas the US ones run from about 8pm throughout the night. There are some Australian races there, but there are few of them. In the end, it doesn't seem like the country affects our PnL either.

In addition, the money the bot makes isn't affected by the amount of money that's been matched on a given runner 15 minutes before the race start.

...and neither is the case for the amount of money available to bet on a given runner (essentially the sum of all volumes available on both sides of the book 15 minutes before the race start).

That's a curious plot, actually. Why are there two clusters? I was scared at first that I was having some issues with scaling in some of my data (I had switched to using integer penny amounts throughout my codebase instead of pounds early on), but it actually is because the UK races have much more money available to bet, as can be seen on the following plot.

CAFBotV3, 4, 5...

I also had tried some other additions to CAFBot that are not really worth describing in detail. There was the usual fiddling with parameters (different order book imbalance threshold values as well as the inventory values beyond which the bot would move its quotes) or minor adjustment to logic (for example, not moving the quotes at high order book imbalance values if getting that quote hit would help the bot reduce its exposure).

There was also Hammertime, a version of CAFBot that would move both back and lay quotes in case of high order book imbalance, in essence joining everybody else in hammering the side of the book with fewer offers. Theoretically, it would have taken a position (up to its position limit) in the direction that the market was about to move, but in practice the order book imbalance indicator isn't great at predicting larger-scale moves, so most of those trades would get either scratched out or end up as losses.

In addition, I had another problem, which is why I had started looking at whether it's possible to select markets that it would be better to trade in: submitting an order to Betfair isn't actually free. Well, it is, but only if one submits fewer than 1000 actions per hour, after which point Betfair begins to charge £0.01 per action. An action could be, for example, submitting a bet at a given level or cancelling it. Actions can't be batched, so a submission of a back at at 2.00 and a back at 2.02 counts as 2 actions.

This would be especially bad for CAFBot, since at each price move it has to perform at least 4 actions: cancelling one outstanding back, one outstanding lay, placing a new back and a new lay. If it were maintaining several offers at both sides of the book and the price moved by more than one tick, it would cost it even more actions to move its quotes. From the simulation results, running the bot for 15 minutes would submit, on average, about 500 actions to Betfair with a standard deviation of 200, which would bring it beyond the 1000-an-hour limit.

Meanwhile in Scala

Throughout this, I was also working on Azura, the Scala version of CAFBot (and then CAFBotV2) that I would end up running in production. I was getting more and more convinced that it was soon time to leave my order book simulator and start testing my ideas by putting real money on them. As you remember, the simulator could only be an approximation to what would happen in the real world: while it could model market impact, it wouldn't be able to model the market reaction to the orders that I was placing.

And since I was about to trade my own money, I would start writing clean and extremely well-tested code, right?

Conclusion

Next time on project Betfair, we'll learn how not to test things in production.

As usual, posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed. Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 4

@mildbyte 1 year, 11 months ago | programming | python | betfair |

Introduction

Previously on project Betfair, we implemented a simple market making strategy and backtested it against some horse races, with disastrous results. Today, things get better.

Greyhounds

Though most of this has been done on horse racing markets, I was also experimenting on-and-off with greyhound racing. I'll be looking at those in depth later, but for now, some quick plots.

First, most action in greyhound markets happens way closer to the beginning of the race. Half of all the bets are matched less than a minute before the race starts!

In addition, the back-lay spreads are much, much wider than in horse markets. Even 2 minutes before start there can be 5 ticks between the best back and lay quote.

So how does the naive version of CAFBot do on these markets?

CAFBot results

Not that well. Looking at the odds chart for an example market, it seems like there are a lot of sporadic jumps where, sort of like in the example horse market from the previous post, the price goes "through" us (i.e. we're the last in the queue at a given price level to get our bet matched, after which that price level flips to the opposite side of the book).

Back to horses

Papers on market making

Like I had mentioned before, there are plenty of papers on market making. Despite that they're written with real-world (equity, futures etc) markets in mind, there are still a lot of things they investigate that can apply to this problem.

High-frequency market-making with inventory constraints and directional bets by Fodra, Labadie (2012) has a decent introduction to what a market maker does and what sort of controls it can have over its behaviour (together with some alarmingly large differential equations). It's based on an earlier paper: High-frequency trading in a limit order book by Avellaneda, Stoikov (2008) which had some very good presentation slides that I got some of the following ideas from. Finally, Exploring Market Making Strategy for High Frequency Trading: An Agent-Based Approach by Xiong, Yamada, and Terano has a nice overview of several market-making strategies.

Inventory risk

According to those papers, there are two risks a market maker faces. The first is inventory risk, which is what we had an issue with last time. If one side of a market maker's quotes gets executed more often than the other, it accumulates an inventory of shares (in the case of equity markets) and is no longer market-neutral (its profit is now dependent on price moves in the market).

There are two main ways to control our market maker: by specifying at what odds (or whether) it makes the market and the number of one-pound contracts it offers on each side. For example, if its exposure is getting too positive (it's backed a runner too much), it can either decrease its offers in size or offer other people to lay at higher odds, thus getting a higher premium from increasing its exposure even more. Managing inventory risk is essentially a trade-off: if we limit it to a minimum, we might miss out on earning the spread when there's not much price movement going on, whereas if we don't control it at all, we risk exposing ourselves to the market too much (having to liquidate our position in the end).

CAFBot with offer size decay

My first attempt at managing inventory risk was fairly simple: let's give the bot a maximum inventory it can have and size its quotes so that the amount it offers is scaled linearly from the full amount when it has no exposure down to 0 when it's at a maximum. For example, if the maximum amount of contracts it can offer is 10 and the maximum inventory it wants is 30:

  • when it's 30 contracts short, offer 10 to buy and 0 to sell;
  • when it's 15 contracts short, offer 10 to buy and 5 to sell;
  • when it's neutral, offer 10 to buy and 10 to sell;
  • when it's 15 contracts long, offer 5 to buy and 10 to sell;
  • when it's 30 contracts long, offer 0 to buy and 10 to sell.

So, did it work?

Sort of. This is the PnL and position chart for the same market as in the previous post. It looks like the exposure management works: it's almost always bounded between -15 and +15 contracts. The profit has improved vastly (we now lose about £0.18 instead of £1.70 from the previous version), but this could be simply due to the fact that our average exposure is now smaller (on average -5 contracts) than that of the previous version (on average -65 contracts).

Running it on the full dataset we get this sad picture.

While the average profit increased, its standard deviation didn't decrease as much: the Sharpe ratio of this strategy is -0.91. To compare, the previous incarnation of CAFBot had a Sharpe ratio of -1.52 / 3.55 = -0.43.

Making a market at multiple odds

This was a small feature that I had there from the beginning, actually. The naive CAFBot from the previous post only ever has offers at two odds: the best back and the best lay. But consider this:

  • Best available to back is 2.00, best available to lay is 2.02
  • The prices move: best back is 2.02, best lay is 2.04
  • We cancel both of our bets and replace them at the new odds
  • The prices move back to 2.00/2.02. We replace our bets again.

What happened there at the 2.00 back level was that we had to cancel a bet and then replace it shortly after when the prices recovered, thus losing our position in the match queue. The addition I had was to maintain bets at several price levels: best back, 1 tick lower than the best back and 2 ticks lower than the best back (likewise for the lay side) etc. That meant, at a larger capital commitment, that the bot would still maintain its position in the queue if the prices changed for a bit. This doesn't actually require any more betting actions to do for a 1-tick move: instead of cancelling a bet at 2.00 and replacing it at 2.02 the bot would, say, cancel a bet at 1.99 and replace it at 2.02.

CAFBot with dynamic spreads

The papers also mention where to make the market as a way to control the inventory (some of them define it as a mid ± spread). Most of them use a solution to those differential equations in order to come up with an function defining the bid and ask to make a market at, but I decided not to go with that, since Betfair markets have varying granularities and spreads are usually already tight enough (in essence, anything that was more than a few ticks away from the best bid wouldn't have a chance to get matched anyway).

Instead, I used an idea from Avellaneda and Stoikov's presentation slides where their agent would move the quote by 1 tick on one side after it had accumulated a certain inventory. So, for example:

  • We make a market at 2.00 best back / 2.02 best lay
  • Our offer at 2.00 gets hit (we were offering to back, thus we've laid the market: we are now short)
  • Instead of replacing the bet at 2.00, we now make the market at 1.99 best back / 2.02 best lay.

For my limit in this test, I used 30 contracts: with 10 contracts offered, that meant the bot would move the quote after its offer got matched 3 times on one side.

I also used the making market on several odds feature to not cancel bets unless the price moves too far away from them, placing bets at 3 levels on both sides of the book.

So, how did it do on our test market?

Results

Oooh. We finally started making money, kind of. Closer to the race beginning, as matched volumes picked up, the MTM shot up dramatically, albeit dropping as dramatically in the end. It still ended trading with about £0.60 of profit, which is pretty cool.

In addition, the average exposure was -6.6 contracts: in a market where the odds were trending down (they went from about 3.9 to 3.4, 10 ticks), the bot's exposure was against it (since the contract price, the inverse of the odds, went up) and yet it still made money. Hence the bot did manage to profit from the spread.

Finally, placing our bets on several levels helped too: on this plot, the bold green and red lines are price levels at which the bot had an offer. You can see it's often the case that a green/red line is not interrupted when the price moves slightly away from it, meaning the bot doesn't cancel a bet and lose its position in the queue. In fact, on the same market, but with only 1 bet on both sides, the bot would have made only £0.20.

Let's test it on the whole dataset now.

Not that good, actually. The Sharpe ratio is better than the previous version, but we lost more money on average. At least making a market at 3 price levels helped (below are the results for placing just 1 bet on each side).

So what happened? Let's take a look at the worst market we traded.

It's... choppy. The price is jumping all over the place and there are very few times when it doesn't move (which is where this strategy really shines). There are, on the other hand, many points where the price goes "through" the bot's bet, resulting in it having an opposite exposure to the way the price went.

Looking closer, we can see how wild this market is: in the span of 90 seconds the odds crashed from 1.80 down 10 ticks to 1.70 and then back up again. In addition, in the first 30 seconds the best back/lay oscillated between 2 levels and the bot would diligently follow it, getting executed at 1.79 and then having its position closed at the same or worse price.

So all in all, while inventory control does help the bot not place too many directional bets, it still doesn't help it make money.

Adverse selection

The second risk a market maker faces is adverse selection, the risk from information asymmetry. Someone who does cross the spread gets the benefit of choosing when their order gets executed, whereas a market maker simply posts orders and doesn't have any control over when or whether they will get hit. This creates a bias in favour of the person trading against the market maker: what if some of them are crossing the spread because they know the market is going to move, as in, they're informed traders?

Let's say there's been some sudden bad news from Apple during trading hours and people start selling Apple shares. A naive market maker that doesn't know what's happening will continue offering to buy them despite that the market clearing price will very soon move down. In the case of Betfair markets, perhaps there can be a macro-level trend (even if the trend is a self-fulfilling prophecy) that the market maker doesn't take into account and more slower traders do, thus trading in a way that perpetuates the trend.

So is there a way to protect ourselves against that? What if the bot could somehow anticipate imminent price movements that might hurt it if it has an exposure to the market and move its quotes or liquidate its position? Perhaps it can.

Conclusion

Next time on project Betfair, we'll delve deeper into market microstructure and find out that at a very micro level, markets sometimes aren't as unpredictable as we thought.

As usual, posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed. Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 3

@mildbyte 1 year, 11 months ago | programming | python | betfair |

Introduction

Previously on project Betfair, we started collecting dumps of Betfair order book data from the Stream API and learned how to reconstruct series of events from those with the aim of using them to backtest a market making strategy.

Today, things get messy.

Insights into the data

As a market maker, our main source of profit should be our orders getting matched, and not us having a directional view (as in betting on which way the odds will move). Hence it's worth investigating how volumes of matched bets vary as the beginning of the race approaches.

Total matched volume

I took the total bet volumes matched on each race through time, starting from 3 hours before the race, and normalized them by diving by the final total volume just before the scheduled race start time (essentially getting the cumulative fractions of the total matched volumes). There are 24 races in this dataset: I now have quite a few more, but their recording is started much closer to the kick-off, and you'll soon see why.

As you can see, bet matching in pre-race horse trading follows a power law. About half of the total bet volume is matched 10 minutes before the race begins, and 80% of all betting happens within 50 minutes from the race start. Note this doesn't include in-play betting where I guess this gets even more dramatic.

Hence there's not much point doing market making earlier than about 15 minutes before the race. Or perhaps could it be that the reason nobody is trading is that the costs are too high?

Back-lay spreads

I plotted the average back-lay spreads through time, and it doesn't seem so. The spreads here are in Betfair ticks: the smallest one is 1 (corresponding to, say, best available back odds of 1.72 and best lay odds of 1.73, or best back at 2.50 and best lay at 2.52). Even 3 hours before the race begins, the spreads, on average, are as tight as they can be.

Design

One useful thing about way the Betfair Stream API is designed is that with the tools given to us, we can maintain a state of the order book in our program as well as our own orders. It's essentially like having the data from API-NG available to us at all times, with notifications when it gets updated.

I decided to split the market-making bot into two parts.

The core would be given the current state of the order book and output the amount of one-pound contracts that it wanted to be offering at both sides of the book, together with the odds. The core's operation would be independent of what's happening to our outstanding orders: if we already have a back bet in the book and the core of the strategy wants to be offering the same amount, then no actions would be produced.

Reconciling the core's wishes with the actual state of our orders would be the execution harness' job. Given the desired offers we wish to maintain on both sides of the book and the state of our orders, it would produce a sequence of cancellations and order placements and submit them to Betfair.

Since all of these inputs (the state of the order book and all of our orders) are actually given to us by the Betfair Stream API, this has a neat bonus of behaving in a consistent way if the bot, say, crashes: we can just start it up again and the API will helpfully give us back all our outstanding orders and positions (not that it's a good idea to blindly trust it). Polling the strategy to find out what actions it wants to perform can be done at any point in time: on a schedule or, say, whenever a new event arrives from the Betfair stream.

So the whole strategy would look something like this:

  • When a new event arrives:
    • Incorporate it into our cached state of the order book and our orders (Betfair even have a guide here).
    • Give the order book to the core, request the amount of contracts it wants to offer and the odds (prices).
    • On both sides of the book, compare our current outstanding orders and what the core wants to do. For every price level, if what the core wants to offer is different from what we are offering with our outstanding orders:
      • If the core wants to offer more, submit an order placement for the difference.
      • If the core wants to offer less, take a list of our outstanding orders at that price level (since there might be several) and start cancelling them, newest first (since they're the least likely to get matched) until we reach the desired volume.

There are some operational issues with this (such as Betfair not usually allowing bet placements below £2), but this is a good general platform to build upon and definitely good enough to start simulating.

Backtester

With an order book simulator already built, augmenting it to inject strategy events is fairly easy. Let's say the simulator operates by maintaining a timestamped event queue (where the events are either placing a bet or cancelling one). We add another event type at given intervals of time (say, 100ms): when the backtester sees it, it gives the current order book and the state of the strategy's orders to the strategy, gets its actions from it and then inserts them back into the queue.

To simulate the delay between us and the exchange, the events are timestamped slightly into the future (say, 75ms) so the simulator might not get to apply player orders to the book immediately and may have to handle other orders before that.

In addition, I got the backtester to record various diagnostics about the strategy every time it polled it, such as how many contracts (actually matched bets) it owned, when its orders got matched and whether the matches were passive (initiated by someone else) or aggressive (initiated by the strategy) etc.

Meet CAFBot

There will also be a DAFBot at some point. This was a very early version. It's fairly different to what I actually ran to make the following plots, but the general idea is the same. I also moved the whole codebase to using integer penny amounts instead of floats soon after this commit.

Overview

It was time to find something to plug into the backtester. I decided to first go with the most naive approach: the core would look at the current best back and lay offers and then offer 10 one-pound contracts (recall that this results in a bet of £10 / odds) on both sides.

Since the execution harness compares the desired offers with the current outstanding bets, that would mean when the strategy was started, it would first place those two bets and then do nothing until at least one of them got fully or partialy matched, at which point it would add back to the bet to maintain the desired amount. If the market were to move (the best back/lay offers changed), the strategy would cancel the orders and replace them at the new best bid/offer.

Finally, 15 seconds before the race start, the strategy would aggressively unload its accumulated exposure (say, if more of its back bets were matched than the lay bets) by crossing the spread (and not waiting to get matched passively).

In terms of markets, I would simulate it from 1 hour before the beginning of the race, running it only on the favourite horse (the one with the lowest odds at that point).

With that in mind, I managed (after a few days of debugging) to simulate it on a few races. So, how did it do?

Backtest results

Here's a result from one race:

Well, this is not looking great.

To explain that plot, the red line is the strategy inventory, or position: how many one-pound contracts it's currently holding. For example, with a position of -170, the strategy will lose £170 if the runner we're trading wins (and £0 if it loses, but in any case we get to keep the money we got from selling the contract). The blue line is the mark-to-market of the strategy: how much money we would make/lose if we were to liquidate our position immediately by crossing the spread, like we do in the last 15 seconds before the race begins.

This is obviously just one race, but surely with an HFT strategy we would expect more consistent profit even within one race? For example, Greg Laughlin of the WSJ did a small investigation into the trading of Virtu, an HFT firm, from their abandoned IPO filings, and showed that with a 51% percentage of profitable trades and about 3 million trades a day their chance of having a losing day was negligible.

But we're not making that many trades here and so perhaps this could be a statistical fluke. It's a good thing I now have all the other race data, collected during further experiments, to check this against.

Nope. Running this against 104 races (albeit with 15 minutes of trading, since the latter datasets I collected were shorter), we can see that the bot has systematic issues and loses, on average, £1.52 per race.

So what happened in the first plot? It looks like we had a fairly chunky short exposure (betting against) throughout the trading. At its worst we were short 175 contracts, which is 17 times more than what we were offering to trade. So there was an imbalance in terms of which side of our offering was getting executed.

Investigating further...

As you can see from the odds plot, the price of the contract (inverse of the odds) trended upwards throughout the final hour before the race, with a few range-bound moves closer to the start. That explains us losing money and also kind of explains us selling too many contracts: in a trending market, there would be more matching on one side of the book, hence a naive strategy that doesn't account for that would quickly get out of balance. Or at least that's what it looks like. There are many ways to read this.

This is the previous plot, but zoomed in towards the part where we lost money the fastest. The orange line shows the best available lay odds and the blue one is the back odds (both of which are where our simulated bot also puts its bets).

The green line in the middle is the volume-weighted average price, or the microprice. It's calculated as follows: $$ \text{VWAP} = \frac{\text{BBPrice} \times \text{BLVolume} + \text{BLPrice} \times \text{BBVolume}}{\text{BBVolume} + \text{BLVolume}} $$

Essentially, if the volume of available bets on one side is bigger, then that side kind of pushes the microprice towards the opposite side. It's a good way to visualize order book imbalance and, interestingly, in that plot, sometimes a move in the actual best back/lay odds can be anticipated by the microprice moving towards one side or the other. We'll look into that later.

The crosses are the bot's orders getting executed: red means the back offer got executed (thus the bot has secured a lay bet and is now more short the market) and green means the lay offer got executed. The dots are the same, but for other people's orders.

What's strange here is that our orders often get hit shortly before there's a price move against us: for example, in the beginning, just before the best back moves from 3.80 down to 3.75 (this is still a 1-tick move), you can see several executions of the bot's back orders, with the bot quickly replacing them several times in between executions. In essence, it's sort of trying to "hold the line" against a market move, but ultimately fails and ends up with having bet against the market whilst the odds managed to move through it. Well, at least we know it's a market move in hindsight: perhaps more people could have joined the bot on its side and maintained the price.

Essentially, the bot does well during times when there's a lot of matching going on on both sides of the book and the price isn't moving. This happens, for example, for a few minutes around 12:30:

But even in this case, this doesn't look too inspiring: the initial 1-pound jump in the PnL at 12:29 is because of the odds moving up and us having a short exposure. The actual market-making part between 12:29 and 12:33 only makes about £0.10.

So in the end, looks like it's back to the drawing board.

Conclusion

Not all was lost, obviously. I had a backtesting engine, a half-working strategy (in that it didn't immediately start bleeding money) and I had an idea of what was going wrong and some possible ways to fix it. And obviously no real money had suffered yet in the making of this.

Next time on project Betfair, we'll start reading some actual papers on market making and tinkering with the bot in attempts to make it lose less money. Perhaps I'll tell about what was happening to the Scala version (that would later supposedly be a production version of this thing) as well. Stay tuned!

As usual, posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed. Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 2

@mildbyte 1 year, 11 months ago | programming | python | betfair |

Probably the hardest thing about writing this will be piecing together what I was doing and thinking at the time. Keeping a development diary was a great idea, it's just a shame I started it 1/3 of the way into the project and the rest is in various commit messages.

At least I wrote some tests.

I would never have been able to get away with these commit messages at my previous job.

In any case, I started with how everything starts: gathering data.

Attempt 1: Betfair API-NG

Good news: Betfair has some APIs. Bad news: it costs £299 to access the API. Better news: I had already gotten an API key back in 2014 when it was free and so my access was maintained.

The first Betfair API I used is called API-NG. It's a REST API that allows to do almost everything the Web client does, including getting a (live or delayed) state of the runner order book (available and recently matched odds), getting information about given markets and runners, placing actual bets or getting the account balance.

Collecting order book data

Since naming things is one of the hard problems in computer science, I decided to pick the Daedric pantheon from The Elder Scrolls to name components of my new trading system, which would be written in Python for now.

  • Hermaeus, "in whose dominion are the treasures of knowledge and memory". Hermaeus is a Python script that is given a Betfair market ID and a market suspend time (when the market's outcome becomes known) as well as a sample rate. When started, it repeatedly polls the REST API at the given sample rate to get the order book contents for all runners and dumps those into a PostgreSQL database.
  • Meridia, "associated with the energies of living things". Meridia reads the Betfair market catalogue (say, all horse races in the UK happening today, their starting times and market IDs) and schedules instances of Hermaeus to start recording the data a given amount of time from the market getting suspended.
  • Hircine, "whose sphere is the Hunt, the Sport of Daedra, the Great Game (which I have just lost), the Chase". Hircine would be the harness for whatever strategy I would decide to run: when executed on a given market, it would read the data collected by Hermaeus and output the desired number of OPCs we wished to hold.

This was pretty much where I paused the design process: downstream of Hircine we would have some sort of a component that would determine whether to place a bet on a given market, but I didn't want to make a decision on what would happen later, since I wanted my further design decisions to be driven by looking at the data and seeing what would and wouldn't work.

I had decided to concentrate on pre-race greyhound and horse markets. Pre-race markets somehow still have dramatic odds movements (despite that there's no information becoming known that would affect the race outcome), however odds don't move by more than 1-2 ticks at a time (unlike football, where a goal can immediately halve the implied probability for some markets, like the Correct Score market). In addition, in-play markets have a bet delay of 1-10 seconds depending on the sports type, whereas for the pre-race markets the submitted bet appears in the order book immediately.

In terms of numbers, there are about 30 horse races in the UK alone on any given day (and those usually are the most liquid, as in, with a lot of bets being available and matched, with action starting to pick up about 30 minutes before the race begins) and about 90 greyhound races (those are less liquid than horses with most of the trading happening in the last 10 minutes before the race begins).

What to do next?

Now, I could either beat other people by making smarter decisions — or I could beat them by making dumb decisions quicker. I quite liked the latter idea: most high-frequency trading strategies are actually fairly well documented, I imagined that "high frequency" on Betfair is much, much slower than high frequency in the real world, and making software faster is better-defined that making a trading model make more money.

And, as a small bonus, trading at higher granularities meant that I didn't need as much data. If my strategy worked by trying to predict, say, which horse would win based on its race history or its characteristics, I would need to have collected thousands of race results before being able to confidently backtest anything I came up with. In a higher frequency world, data collection would be easier: there's no reason why the microstructure of one horse racing market would be different from another one.

The simplest one of those strategies would be market making. Remember me lamenting in the previous post that buying a contract and immediately selling it back results in a ~0.5% loss because of the bid-offer spread? As a market maker, we have a chance of collecting that spread instead: what we do is maintain a both buy (back) and a sell (lay) order in hopes that both of them will get hit, resulting in us pocketing a small profit and not having any exposure, assuming we sized our bets correctly. It's a strategy that's easy to prototype but has surprising depth to it: what if only one side of our offer keeps getting matched and we accumulate an exposure? What odds do we quote for the two bets? What sizes?

With that in mind, I realised that the way I was collecting data was kind of wrong. The idea of the strategy output being the desired position at a given time is easily backtest-able: just multiply the position at each point by the price movement at the next tick and take the cumulative sum. The price in this case is probably the mid (the average of the back and the lay odds) and for our cost model we could assume that we cross the spread at every trade (meaning aggressively take the available odds instead of placing a passive order and waiting to get matched).

Sadly, this doesn't fly with a market-making strategy: since we intend to be trading passively most of the time, we have to have a way to model when (or whether: while we're waiting, the price could move underneath us) our passive orders get matched.

How an order book works

If we have to delve into the details of market microstructure, it's worth starting from describing how a Betfair order book actually works. Thankfully, it's very similar to the way a real exchange order book works.

When a back (lay) order gets submitted to the book, it can either get matched, partially matched, or not matched at all. Let's say we want to match a new back (lay) order N against the book:

  • take the orders at the opposite side of the book: lay (back).
  • look at only those that have higher (lower) odds than N.
  • sort them by odds descending (ascending) and then by order placement time, oldest first.
  • for each one of these orders O, while N has some remaining volume:
    • if N has more remaining volume than O, there's a partial match: we decrease N's remaining volume by O's volume, record a match and remove O from the book completely.
    • if N has less remaining volume than O, we decrease O's remaining volume by N's remaining volume, record a match and stop.
    • if both volumes are equal, we remove O from the book, record a match and stop.
  • if N has any remaining volume, add it to the order book.

It's easy to see that to get our passive order matched, we want to either offer a price that's better than the best price in the book (highest odds if we're offering a back (laying) or lowest odds if we're offering a lay (backing)) or be the earliest person at the current best price level. One might think of matching at a given price level as a queue: if we place a back at 1.70 (thus adding to the available to lay amount) and there is already £200 available at that level, we have to wait until that money gets matched before we can be matched — unless while we are waiting, the best available to lay odds become 1.69 and so we'll have to cancel our unmatched bet and move it lower.

So how do we use this knowledge to backtest a market-making strategy? The neatest way of doing this that I found was taking the order book snapshots that I collected and reconstructing a sequence of order book events from it: actions that take the order book from one state to the next one. Those can be either back/lay order placements (possibly triggering a match) or cancellations. If these events were to be be plugged back into the order book simulator, we would end up with order book states that mirror exactly what we started with.

However, if we were to also insert some of our own actions in the middle of this stream, we would be able to model the matching of passive orders: for example, if the best available lay at a given point in time was 1.70 and we placed an order at the better odds (for others) at 1.69, all orders in the event stream that would have matched at 1.70 would now be matching against our new order, until it got exhausted. Similarly, if a new price level was about to be opened at 1.69 and we were the first order at that level, all orders would be matching against us as well.

Inferring order book events

So I set out to convert my collected snapshots into a series of events. In theory, doing this is easy: every snapshot has the odds and volumes of available back and lay bets, as well as the total amount traded (matched) at each odds. So, to begin, we take the differences between these 3 vectors (available to back, available to lay, total traded) in consecutive snapshots. If there's no traded difference, all other differences are pure placements and cancellations without matches. So if there's more available to back/lay at a given level, we add a PLACE_BACK/PLACE_LAY instruction to the event stream — and if there's less, we add a CANCEL_BACK/CANCEL_LAY instruction for a given amount. Since we don't know which exact order is getting cancelled (just the volume), the order book simulator just picks a random one to take the volume away from.

Things get slightly more tricky if there's been a trade at a given odds level. If there's only been one order book action between the snapshots, it's fairly easy to infer what happened: there will be a difference in either the outstanding back or lay amounts at those odds, meaning there was an aggressive back/lay that immediately got matched against the order. Sometimes there can be several odds levels for which there is a traded amount change — that means an order was so large that it matched against orders at several odds. After the traded amount differences are reconciled, we need to adjust the available to back/lay differences to reflect that and then proceed as in the previous paragraph to reconcile unmatched placements/cancellations.

There was a problem with the data I collected, however: API-NG limits the number of requests to it to about 5 per second. Worse even, despite that I was making about 3 requests per second, the API would still sometimes throttle me and not give back an HTTP response until a few seconds later. This meant that order book snapshots had more than one event between them and in some cases there were way too many possible valid sets of events that brought the order book from one state to the other.

Thankfully, a slightly more modern way of getting data was available.

Attempt 2: Betfair Exchange Stream API

Unlike its brother, API-NG, the Exchange Stream API is a push API: one opens a TCP connection to Betfair and sends a message asking to subscribe to certain markets, at which point Betfair sends back updates whenever an order placement/cancellation/match occurs. Like its brother, the Stream API uses JSON to serialise data as opposed to a binary protocol. Messages are separated with a newline character and can be either an image (the full order book state) or an update (a delta saying which levels in the order book have changed). The Stream API can also be used to subscribe to the state of user's own orders in order to be notified when they have been matched/cancelled.

This was pretty cool, because it meant that my future market making bot would be able to react to market events basically immediately, as opposed to 200ms later, and wouldn't be liable to get throttled.

Here's an example of a so-called market change message:

{"op":"mcm","id":2,"clk":"ALc3AK0/AI04","pt":1502698655391,"mc":[
    {"id":"1.133280054","rc":[
        {"atb":[[5.7,5.7]],"atl":[[5.7,0]],"trd":[[5.7,45.42]],"id":12942916}]}]}

This one says that in the market 1.133280054, we can't bet against (lay) the runner 12942916 at odds 5.7, but instead can bet up to £5.70 on it. In addition, now the total bets matched at odds 5.7 amount £45.42 (actually it's half as much, because Betfair counts both sides of the bet together). In essence, this means that there was an aggressive lay at 5.7 that got partially matched against all of the available lay and £5.70 of it remained in the book at available to back, basically moving the market price.

As for the other parts of the message, pt is the timestamp and clk is a unique identifier that the client at subscription time can send to Betfair to get the whole market image starting from that point (and not from the point the client connects: this is used to restart consumption from the same point if the client crashes).

Another program had to be added to my Daedric pantheon. Sanguine, "whose sphere is hedonistic revelry, debauchery, and passionate indulgences of darker natures", would connect to the Stream API, subscribe to the order book stream for a given market and dump everything it received into a text file. It was similar to Hermaeus, except I had given up on creating a proper schema at this point to store the data into PostgreSQL (and even in the case of Hermaeus it was just the metadata that had a schema, the actual order book data used the json type). Still, I was able to schedule instances of it to be executed in order to record some more stream data dumps.

And indeed, the data now was much more granular and was actually possible to reconcile (except for a few occasions where some bets would be voided and the total amount traded at given odds would decrease).

Here's some code that, given the current amounts available to back, lay and traded at given odds, as well as a new order book snapshot, applies the changes to our book and returns the events that have occurred.

def process_runner_msg(runner_msg, atb, atl, trds):
    events = []

    # represent everything in pennies
    back_diff = {p: int(v * 100) - atb[p] for p, v in runner_msg.get('atb', [])}
    lay_diff = {p: int(v * 100) - atl[p] for p, v in runner_msg.get('atl', [])}
    trd_diff = {p: int(v * 100) - trds[p] for p, v in runner_msg.get('trd', [])}

    for p, v in trd_diff.iteritems():
        if not v:
            continue
        if v < 0:
            print "ignoring negative trade price %f volume %d" % (p, v)
            continue

        if p in back_diff and back_diff[p] < 0:
            back_diff[p] += v / 2  # betfair counts the trade twice (on both sides of the book)
            atb[p] -= v / 2
            trds[p] += v
            events.append(('PLACE_LAY', p, v / 2))
        elif p in lay_diff and lay_diff[p] < 0:
            lay_diff[p] += v / 2
            atl[p] -= v / 2
            trds[p] += v
            events.append(('PLACE_BACK', p, v / 2))
        elif p in back_diff:
            back_diff[p] += v / 2
            atb[p] -= v / 2
            trds[p] += v
            events.append(('PLACE_LAY', p, v / 2))
        elif p in lay_diff:
            lay_diff[p] += v / 2
            atl[p] -= v / 2
            trds[p] += v
            events.append(('PLACE_BACK', p, v / 2))
        else:
            print "can't reconcile a trade of %d at %.2f" % (v, p)

    # these were aggressive placements -- need to make sure we're sorting backs (place_lay) by price descending
    # (to simulate us knocking book levels out) and lays vice versa
    events = sorted([e for e in events if e[0] == 'PLACE_LAY'], key=lambda e: e[1], reverse=True) + \
             sorted([e for e in events if e[0] == 'PLACE_BACK'], key=lambda e: e[1])

    for p, v in back_diff.iteritems():
        if v > 0:
            events.append(('PLACE_BACK', p, v))
        elif v < 0:
            events.append(('CANCEL_BACK', p, -v))
        atb[p] += v

    for p, v in lay_diff.iteritems():
        if v > 0:
            events.append(('PLACE_LAY', p, v))
        elif v < 0:
            events.append(('CANCEL_LAY', p, -v))
        atl[p] += v

    return events

Parallel to this, I was also slowly starting to implement an Exchange Stream API client that would subscribe to and handle market change messages in Scala. This contraption, named Azura (whose sphere is "the period of transition and change"), would form the base of the actual live market making bot and/or any other trading strategies that would use the Stream API.

Conclusion

Next time on project Betfair, we'll start implementing and backtesting our market making bot, as well as possibly visualising all the data we collected.

As usual, posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed. Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism, ease of navigation and control over what content a user follows. Try the demo here and/or follow it on Twitter as well at twitter.com/kimonote!

project Betfair, part 1

@mildbyte 1 year, 11 months ago | programming | betfair | 1 comment

Introduction

Like I had said in a previous post, I tried my hand at automated trading on Betfair for a couple of months. While I didn't make much money from it, I still think there are a lot of cool things I could write about. Here's a rough outline of what you will be able to read about in this series when I'm done:

  • collecting and reconstructing order book events from the Betfair Stream API
  • backtesting high-frequency trading strategies by replaying order book events into them
  • insights about betting market microstructure
  • market making in betting markets (fun things like adverse selection and inventory risk)
  • a harness in Scala for live trading against Betfair Stream API
  • an extremely simple (non-market-making) fully automated strategy that was run for a week against ~150 horse races

Let's get into it, shall we?

Background

Betfair is a betting exchange, similar to other betting exchanges (BetDAQ, Smarkets etc). Unlike a classical bookmaker (say PaddyPower or Ladbrokes), Betfair allows one to bet against an outcome (a lay bet) as well as for it. In addition, all bets on an exchange are offered by other participants in the market, not Betfair itself.

This allows people to make money not only by betting on something (essentially having a better judgment of the likelihood of a given event than other participants) but also by trading (locking in a profit by backing and laying at different odds as they move through time).

The exchange consists of events (such as, say, a given match between Arsenal and Huddersfield). Events consist of markets (such as the Match Odds market or the Exact Score market), which consists of runners (for a Match Odds market, the runners are Huddersfield, Arsenal and Draw. For the Exact Score market, it's all the scores from 0-0 to 3-3 + any Unquoted).

Arsenal vs Huddersfield event view

Arsenal vs Huddersfield event view

Runners are probably named like that because most gambling used to be on horses. In essence, runners are almost always an exact set cover of the outcomes in a given market (they're disjoint and account for all outcomes). Almost, because Betfair has "Dead Heat" rules regarding what happens to bets when, say, two horses cross the finish line at the same time.

Inside a runner, we have an order book: the odds one can take to back or lay a given runner, as well as the amount of money one can put on it. Since when backing, we always want bigger odds (if I put £1 on something at 2.0, I get back £2 if it wins and if the odds are 3.0, I get back £3 while risking the same amount of money), back offers with higher odds take priority and get matched first.

Note that a back offer appears in the order book if and only if there's someone offering to take the other side (i.e. lay the runner). When laying, lower odds are obviously better (by laying £1 pound, I get £1 pound no matter what, but, if the runner wins, I will have to pay back more with higher odds).

Order book for Arsenal winning the match

Order book for Arsenal winning the match

This is similar to a real exchange order book where, slightly counter-intuitively, the odds are prices and the money we're betting on something are volumes (in fact, that's how the Betfair API internally refers to order book lines).

Unlike a real-world exchange, however, the odds have varying granularities and aren't in one-cent increments. One can bet at odds of 1.01, 1.02, ..., 2.02, 2.04, ..., 3.05, 3.10, ... This is to reflect that at higher odds, risks/rewards are higher for the same bet size and so a move from odds of 1.01 to 2.0 is similar to a move from 500.0 to 1000.0. More information here.

Setting up a mathematical framework

When betting, we're looking at odds and payoffs. This is different from trading a financial instrument, where we think about it in terms of our position. For example, if I'm long an Apple share, I can sell it, thus having 0 shares and my net worth now being independent of Apple's performance, or sell 2 shares, becoming short and benefiting from Apple shares dropping in price.

It would be nice to produce something similar for betting: that way, when we're trading, we can operate with familiar terms (I'm long 1 contract on this market, I sell 1, I'm now neutral: no matter whether the horse/football team loses or wins, the money I have at the end doesn't change).

One-pound contracts

Let's think of Betfair bets in terms of one-pound contracts (or an OPC): when a given event has ended, such contract can have a value of either £1 (if say the horse we're betting on has won) or £0.

A normal bet can be either a back (betting for) or a lay (betting against). For a back of \( b \) at odds \( o \), the payoff is \( b * o \) if the runner wins and 0 if the runner loses. The actual contract costs us \( b \).

For a lay of \( b \) at odds \( o \), the payoff is opposite: \( -b * o \) if the runner wins and 0 if the runner loses — but we get \( b \) from selling the contract in any case.

How do we transform between the normal odds world and the OPC world? Let's say we back an outcome by putting \(b \) pounds on it at odds of \( o\). This is the same as buying \(b * o\) OPCs at the price \(\frac{1}{o}\) each. Hence:

  • if the runner wins, our contracts are redeemed for \( b * o \) pounds: our total payoff is \( b * o - b \)
  • if the runner loses, our contracts are worth nothing: our total payoff is \( 0 - b \)

Laying a runner for \( b \) at \( o \) is the same as selling \( b * o \) OPCs at the price of \(\frac{1}{o} \) each: we immediately get \(\frac{1}{o} * b * o = b \) pounds and:

  • if the runner wins, we have to pay out \( b * o \) pounds but keep the original money we got: our total payoff is \( b - b * o \)
  • if the runner loses, we owe nothing and keep the money: our total payoff is \( b \).

Note that the back outcomes and the lay outcomes are inverses of each other: one's gain is another's loss, that is, betting is a zero-sum game.

What does this price mean? It's actually the implied probability of a given runner winning. Consider the expected payoff of backing a runner: it's \( (1 - p) * 0 + p * (b * o) - b = p * b * o - b \). On average, we would expect to make zero money from this (otherwise we could easily make money by backing a given runner all the time). Hence \( p = \frac{1}{o} \).

So essentially we can use the price of an OPC and the implied probability interchangeably and then:

  • Buy \( n \) OPCs at \( p \) (or buy \( n * p \) worth of OPCs at price \( p \)) = Back \( n * p \) at odds \( \frac{1}{p} \)
  • Sell \( n \) OPCs at \( p \) (or sell \( n * p \) worth of OPCs at price \( p \)) = Lay \( n * p \) at odds \( \frac{1}{p} \).

Hedging

It's possible to offset a back bet with a lay bet to, say, lock in a profit or a loss (also known as hedging or greening-up: no matter who wins, we get or lose the same amount of money). Using the OPC framework, however, this becomes much more intuitive.

Let's say we're trading a runner where the odds moved from \( o_1 \) down to \( o_2 \). We have placed a back bet of \( b_1 \) at \( o_1 \). Hence we have bought \( b_1 * o_1 \) OPCs for \( b_1 \).

When the odds become \( o_2 \), we want to sell these OPCs: we get \( \frac{b_1 * o_1}{o_2} \) pounds for them (in Betfair world, we have placed a lay bet of \( \frac{b_1 * o_1}{o_2} \) at the available odds of \( o_2 \)). Now we have \(\frac{b_1 * o_1}{o_2} - b_1 \) pounds and 0 OPCs, so we are market neutral in our framework.

Does this actually work? If the runner wins:

  • our back bet wins (payoff \( b_1 * o_1 - b_1 \))
  • the lay loses (payoff \(\frac{b_1 * o_1}{o_2} - o_2 * \frac{b_1 * o_1}{o_2} \))
  • the total payoff is \(\frac{b_1 * o_1}{o_2} - b_1 \)

If the runner loses:

  • our back loses (payoff \( -b_1 \))
  • the lay wins (\(\frac{b_1 * o_1}{o_2}\))
  • we get \(\frac{b_1 * o_1}{o_2} - b_1 \) as well.

So the payoff is the same in both cases.

Example strategy: cross-market arbitrage

On Betfair, one can often bet on a conjunction of disjoint events (events in the probability theory sense, not in the Betfair sense) as well as as each of those events separately. For example, Arsenal winning a given football match is the same as Arsenal winning with a score of 1-0 or Arsenal winning with a score of 2-0 or Arsenal winning with a score of 2-1 etc. Technically, the available bets in that case only go up to 3 goals on each side, but outcomes beyond that are fairly rare (and in the proposed strategy we end up better off with an unquoted outcome).

Assuming these two events (Arsenal winning at all and Arsenal winning with either of the available scores) are the same, their implied probabilities and hence prices should also be the same. So it's possible that we can sometimes lock in a small risk-free profit by backing the total outcome and laying each one of its constituents.

How do we calculate whether this arbitrage exists?

Let's say an event \(E\) is a conjunction of disjoint events \(E_1 ... E_k\). If we can bet on these events with implied probabilities (prices) \(p, p_1, ... p_k\) we can buy 1 OPC on \(E\) for \(p\) and then immediately turn around and sell \(k\) OPCs on \(E_i\) for \(p_i\) each (getting \(p_1 + p_2 + ... + p_k\) in total).

Since these events are disjoint, if \(E\) happens, only one of \(E_1, ..., E_k\) happens. Then the contract on \(E\) becomes worth £1, the contract on one and only one \(E_i\) is worth -£1 and the rest are worth £0, hence we're only left with the money from the original trade (the contracts offset each other).

If \(E\) doesn't happen, all OPCs expire worthless. Hence we're market-neutral.

The money we get from buying a contract and immediately selling its constituents is \(-p + p_1 + p_2 + ... + p_k\). So if \(p\) is greater (or less) than the constituents, there's an arbitrage.

In fact, in this case, if the first team wins with an unquoted score (i.e. above 3), then neither of \(E_i\) happens and we pocket a profit of \(p_1 + p_2 + ... + p_k\).

In practice, there are a few real-world issues that make this unrealistic, such as bid-offer spreads. If we back something and then immediately lay it, we lose money, since there's a difference between the best available back odds and best available lay odds. Even with the smallest spreads (for example, backing at 1.71 and laying at 1.72), the cost is about 0.5%. Worse even, Betfair charges a 5% commission on the winnings in a given market: since in our case, we're trading 2 markets (Match Odds and Exact Score) with only one of them winning, we have to have an expected profit of 5% just to break even.

Conclusion

Thanks for reaching the end of this math-heavy bit! Next up, we'll start collecting data from the Betfair Stream API and things might begin moving faster.

Posts in this series will be available at https://kimonote.com/@mildbyte:betfair or on this RSS feed . Alternatively, follow me on twitter.com/mildbyte.

Interested in this blogging platform? It's called Kimonote, it focuses on minimalism (the only JavaScript is MathJax on this and another page), ease of navigation and control over what content a user follows. Try the demo here and/or follow it on Twitter as well at twitter.com/kimonote!

What have I been up to

@mildbyte 1 year, 12 months ago | admin | programming | kimonote | betfair |

It's been more than 3 months since my last post. Have I finally sunk into the depths of depression and debauchery since leaving my job? Do I start my day by waking up at 11am and having a mimosa? Do I end it with a can of Pringles at 3am, going through a box set of The Wire?

As if.

project Betfair

I tried my hand at automated trading on Betfair, which is a sports betting exchange. Automated trading on Betfair is similar to automated trading in real life, except the stakes are lower, the taxes are none and the APIs are more pleasant.

In the process of getting this whole system to work, I managed to do many interesting things:

  • wrote a collector/reconstructor of order book events from the Betfair Stream API
  • wrote a harness that backtests high-frequency trading strategies by replaying order book events into them (thus simulating market impact) and can collect various statistics
  • learned about market making (fun things like adverse selection and inventory risk)
  • wrote a harness in Scala for live trading against the Betfair Stream API
  • let an extremely simple (non-market-making) fully automated strategy run for a week against ~150 horse races -- it made about £1500 worth of bets and lost £9. Since it didn't lose too much money, looks like the basic idea worked. Since it didn't make money, I can now blog about it.

In fact, even if it did make money, I think there's a lot of interesting stuff and pretty pictures to look at and write about. Perhaps one day I even will.

project Bitcoin

I read some Bitcoin and Ethereum whitepapers. It's a pretty cool idea.

project Kimonote

You may have noticed that it looks like blog isn't statically generated by Hakyll any more. That's because it isn't.

One day when I was taking a break from my Betfair adventures, I decided to reorganise all the stuff in my private diary (that I use DokuWiki for) and realised that it would be cool to:

  • run LDA on my entries to see how they classify into topics
  • hmm, topics are just like tags, what if I could auto-tag my posts?
  • what if I used this for my blog as well?
  • what if tags didn't suck and were a valid navigational tool?
  • what if I could filter posts by not one, but multiple tags?
  • what if I could subscribe to posts with some specific tags from a user? Like RSS, but with extra filters, or like Facebook, but without influencing elections?
  • what if text-based websites didn't exceed in size the major works of Russian literature?

At least that's what I think my thought process was now, back then it was more like "what if I learned Django?".

So I made Kimonote, which I guess is a note-organizer-slash-blogging-platform that focuses on minimalism and ease of navigation. Wanna see my previous post tagged 'programming'? It's there, on the left, in the sidebar. Previous post on the blog? It's also there. News from Paddington? Here. News from Paddington, but in a format that allows you to consume the whole history without clicking "next"? Sure. Project Morrowind and my posts on Raspberry Pi from before I went to university? No problem.

It's Javascript-free as well and uses basscss for styling. This means this whole page takes up 20KB, about 100 times less than a small Medium blog post.

Interested? Head on to the landing page: there's a sign-up link there that lets you... no, not sign up, but leave your email so I can see if it's worth making it more than a place where my blog lives.


Copyright © 2017–2018 Artjoms Iškovs (mildbyte.xyz). Questions? Comments? Suggestions? Contact support@kimonote.com.