View: compact / full
Time: backward / forward
Source: internal / external / all
Public RSS / JSON
It's been two years since project Morrowind (which apparently now has been made an official speedrun category). During that time, I've been working on another exciting project and it's time to finally announce it.
Today, we are delighted to launch Splitgraph, a tool to build, extend, query and share datasets that works on top of PostgreSQL and integrates seamlessly with anything that uses PostgreSQL. It brings the best parts of Git and Docker, tools well-known and loved by developers, to data science and data engineering, and allows users to build and manipulate datasets directly on their database using familiar commands and paradigms.
Splitgraph launches with first-class support for multiple data analytics tools and access to over 40000 open government datasets on the Socrata platform. Analyze coronavirus data with Jupyter and scikit-learn, plot nearby marijuana dispensaries with Metabase and PostGIS or just explore Chicago open data with DBeaver and do so from the comfort of a battle-tested RDBMS with a mature feature set and a rich ecosystem of integrations.
Feel free to check out our introductory blog post the frequently asked questions section!
Let's consider the life of a more and more prominent type of worker in the industry: a data scientist/engineer. Data is the new oil and this person is the new oil rig worker, plumber, manager, owner and operator. This is partially based on my own professional experience and partially on over-exaggerated horror stories from the Internet, just like a good analogy should be.
Came to work to a small crisis: a dashboard that our marketing team uses to help them direct their door hinge (yes, we sell door hinges. It's a very niche but a very lucrative business) sales efforts is outputting weird numbers. Obviously, they're not happy. I had better things to do today but oh well. I look at the data that populates the dashboard and start going up the chain through a few dozen of random ETL jobs and processes that people before me wrote.
By lunchtime, I trace this issue down to a fault in one of our data vendors (that we buy timber price data from: apparently it's a great predictor of door hinge sales): overnight, they decided to change their conventions and publish values of 999999 where they used to push out a NULL. I raise a support ticket and wait for it to be answered. In the meantime, I enlist a few other colleagues and we manage to repair the damage, rerunning parts of the pipeline and patching data where needed.
Support ticket still unanswered (well, they acknowledged it but said they are dealing with a sudden influx of support tickets, I wonder why) but at least we have a temporary fix.
In the meantime, I start work on a project that I wanted to do yesterday. I read a paper recently that showed that another predictor of door hinge sales is council planning permissions. The author had scraped some data from a few council websites and made the dataset available on his Web page as a CSV dump. Great! I download it and, well, it's pretty much what I expected it to be: no explanation of what each column means and what its ranges are. But I've seen worse. I fire up my trusty Pandas toolchain and get to work.
By the evening, there's nothing left of the old dataset: I did some data patching and interpolation, removed some columns and combined some other ones. I also combined the data with our own historical data for door hinge sales in given postcodes. In conjunction with this data, the planning permission dataset indeed gives an amazing prediction accuracy. I send the results to my boss and go home happy.
This is the happiest I'll be this week.
The timber sales data vendor has answered our support ticket. In fact, our query made them inspect the data closer at which point they realised they had some historical errors in the data which they decided to rectify. The problem was that they couldn't send us just the rows that were changed and instead linked us to an SQL dump of the whole dataset.
I spend the rest of the day downloading it (turns out, there's a lot of timber around) and then hand-crafting SQL queries to backfill the data into our store as well as all the downstream components.
In the meantime, marketing together with my boss has reviewed my results and is really excited about council planning permission data. They would like to put it into production as soon as possible.
I send some e-mails to the author of the paper to find out how they generated the data and if they would be interested in sharing their software, whilst also trying to figure out how to plumb it into our pipeline so that the projections can make it into their daily reports.
Boss is also unhappy about our current timber data vendor and is wondering if I could try out a dataset provided by another vendor. Easier said than done, as now I have to somehow reproduce the current pipeline on my machine, swap the new dataset in and rerun all our historical predictions to see how they would have fared.
The council planning permission data project is probably not happening. Firstly, it's because the per-postcode sales data that I used in my research is in a completely different database engine that we can't directly import into our prediction pipeline. But in worse news, the author of the paper doesn't really remember how he produced the data and whether his scraping software still works.
After a whole day of searching, I did manage to find a data vendor that seems to be doing similar things, with no pricing data, no nothing. I drop them an e-mail and go home.
Come to work to learn about an overnight production issue that the operators managed to mitigate but now I have to actually fix. Oh well. I get the tag of the container we had running in production and do a
docker pull. I fire it up locally and use a debugger (and the container's Dockerfile) to locate the issue: it's a bug in an open-source library that we're using. I do a quick scan of GitHub issues to see if it's been reported before. Nope. I raise an issue and also submit a pull request that I think should fix it.
In the meantime, the tests I run locally for that library pass with my fix so I change the Dockerfile to build the image from my patched fork. I do a
git push on the Dockerfile, our CI system builds it and pushes the image out to staging. We redirect some real-world traffic to staging and it works. We do a rolling upgrade of the prod service. It works.
I spend the rest of the day reading Reddit.
Github issue still unanswered, but we didn't have any problems overnight anyway.
I have some more exciting things to do: caching. Some guys from Hooli have open-sourced this pretty cool load-balancing and caching proxy that they wrote in Go and it fits our use case perfectly for an internal service that has always had performance issues.
They provide a Docker container for their proxy, so I quickly get the
docker-compose.yml file for our service, add the image to the stack and fiddle around with its configuration (exposed via environment variables) to wire it up to the service workers. I run the whole stack up locally and rerun the integration tests to hit the proxy instead. They pass, so I push the whole thing out to staging. We redirect some requests to hit the staging service in order to compare the performance and correctness.
I spend the rest of the day reading Reddit.
The Github issue has been answered and my PR has been accepted. The developer also found a couple of other bugs that my fix exposed which have also been fixed now. I change our service to build against the latest tag, build on CI, tests pass.
I look at the dashboards to see how my version of the service did overnight: turns out, the caching proxy reduced the request latency by about a half. We agree to push it to prod.
I spend the rest of the week reading Reddit.
There is a lot of tools, workflows and frameworks in software engineering that made developers' lives easier and that paradoxically haven't been applied to the problem of data processing.
In software, you do a
git pull and bring the source code up to date by having a series of diffs delivered to you. This ability to treat new versions as patches on top of old versions has opened up more opportunities like rebasing, pull requests and branching, as well as inspecting history and merge conflict resolution.
None of this exists in the world of data. Updating a local copy of the dataset involves downloading the whole image again, which is crazy. Proposing patches to datasets, having them applied and merging several branches is unspoken of and yet is a common workflow in data science: why can't I maintain a fork of data from a vendor with my own fixes on top and then do an occasional
git pull --rebase to have my fork up to date?
In software, we have learned to use unique identifiers to refer to various artifacts, be it Git commit hashes, Docker image hashes or library version numbers. When someone says "there's a bug in version 3.14 (commit 6ff3e105) of this library", we know exactly which codebase they refer to and how we can get and inspect it.
This doesn't happen with data pipelines: most of the time we hear "the data we downloaded last night was faulty but we overwrote chunks of it and it's propagated downstream so I've no idea what it looks like now". It would be cool to be able to refer to datasets as single, self-contained images and for any ETL job to be just a function between images: if it's given the same input image, then it will produce the same output image.
To expand on that, Docker has made this "image" abstraction even more robust by packaging all of the dependencies of a service together with that service. This means that this container can be run from anywhere: on a developer's machine, on a CI server, or in production. By giving the developers tools that make replicating the production experience easier, we have decreased the distance between development and production.
I used to work in quant trading and one insight I got from that is that getting a cool dataset and finding out that it can predict the returns on some asset is only half of the job. The other half, less talked about and much more tedious, is productionizing your findings: setting up batch jobs to import this dataset and clean it, making sure the operators are familiar with the import process (and can override it if it goes wrong), writing monitoring tools. There's the ongoing overhead of supporting it.
And despite that, there is still a large distance between research and production in data science. Preparing data for research involves cleaning it, importing it into say Pandas, figuring out what every column means, potentially hand-crafting some patches. This is very similar to old-school service set up: do a
sudo apt-get install of the service, spend time setting up its configuration files, spend time installing other libraries and by the end of the day don't remember exactly what you did and how to reproduce it.
Docker made this easier by isolating every service and mandating that all of its dependencies (be it Linux packages, configuration files or any other binaries) are specified explicitly in a Dockerfile. It's a painful process to begin with but it results in something very useful: everyone now knows exactly how an image is made and its configuration can be experimented on. One can swap out a couple of
apt-get statements in a Dockerfile to install, say, an experimental version of
libc and get another version of the same service that they can compare against the current one.
In an ideal world, that's what would happen with data: I would write a Dockerfile that grabs some data from a few upstream repositories, runs an SQL JOIN on tables and produces a new image. Even better, I should be able to have this new image kept up to date and rebase itself on any new changes in the upstream. I should be able to rerun this image against other sources and then feed it into a locally-run version of my pipeline to compare, say, the prediction performance of the different source datasets.
We are slowly coming to a set of standards on how to distribute software which has reduced onboarding friction and allowed people to quickly prototype their ideas. One can do a
docker pull, add an extra service to their software stack and run everything locally to see how it behaves within minutes. One can search for some software on GitHub and
git clone it, knowing that it probably has fairly reproducible build instructions. Most operating systems now have package managers which provide an index of software that can be installed on that system as well as allow the administrator to keep those packages up to date.
There is a ton of open data out there, with a lot of potential hidden value, and most of it is unindexed: there's no way to find out what it is, where it is, who maintains it, how often it's updated and what does each column mean. In addition, all of it is in various ad hoc formats, from CSVs to SQL dumps, from HDF5 files to unscrapeable PDF documents. For each one of these datasets, an importer has to be written. This raises the friction of onboarding new datasets and innovating.
One thing that Git and Docker are popular for is that they're unopinionated: they don't care about what is actually being versioned or run inside of the container. If Git only worked with a certain folder structure or required one to execute system calls to perform checkouts or commits, it would never have taken off. That is,
git doesn't care whether what it's versioning is written in Go, Java, Rust, Python or is just a text file.
Similarly with Docker, if it only worked on artifacts produced by a certain programming language or required every program to be rewritten to use Docker, that would slow down adoption a lot if not outright kill the tool.
Both of these tools build up on an abstraction that has been around for a while and that other tools use: the filesystem. Git enhances tools that use the filesystem (such as the IDE or the compiler) by adding versioning to the source code. Docker enhances the applications that use the filesystem (that is, all of them) by isolating them and presenting each one with its own version of reality.
Such an abstraction also exists in the world of data: it's SQL. A lot of software is built on top of it and a lot of people, including non-technical ones, understand SQL and can write it. And yet most tools around want users to learn their own custom query language.
All these anecdotes and comparisons show that there are a lot of practices that data scientists can borrow from software engineering. They can be combined into three core concepts:
git pull, not by downloading a new data dump and rerunning one's import scripts.
Over the past two years, I and Miles Richardson have been building something in line with this philosophy.
Splitgraph is a data management tool and a sharing platform that is inspired by Docker and Git. It currently is based on PostgreSQL and allows users to create, share and extend SQL schema images. In particular:
pulletc to produce and inspect new commits to a database. Whilst an image is checked out into a schema, it's no different from a set of ordinary PostgreSQL tables: any other tool that speaks SQL can interact with it, with changes captured and then packaged up into new images.
We have already done a couple of talks about Splitgraph: a short one at a local Docker meetup in Cambridge (slides) talking about parallels between Docker and Splitgraph and a longer one at a quantitative hedge fund AHL (slides) discussing the philosophy and the implementation of Splitgraph in-depth. A lot has changed since then but it still is a good introduction to our philosophy.
Interested? Head on to our quickstart guide or check out or the frequently asked questions section to learn more!
All of humanity's problems stem from man's inability to sit quietly in a room alone.
All too often, in online and offline discourse, when I (or I see someone else) voice a concern about some phenomenon, the argument gets shot down with something like "Your problems are first-world problems, there are people who have it (or historically had it) much worse than you" or "Well, it could always be worse. What if you didn't have (a job/a car/food/money/a romantic partner)?"
In a way, it feels like a special case of whataboutism ("Yes, X did a bad thing, but Y also did a bad thing, so how about we discuss that instead"). To myself, I used to call it "the African children fallacy" and sure, it's kind of insensitive, but I thought that it nicely references a well-known form of it ("how dare you complain about this when there are children starving in Africa?").
Recently, I started digging into it further and learned that it's called "the fallacy of relative privation" or the "not as bad as" fallacy (RationalWiki). In this essay, I want to investigate why I don't like it being used, as well as possible reasons for it getting brought up.
In a recent Hacker News discussion on "The Workplace Is Killing People and Nobody Cares", a Stanford Business School article on the harms brought by the modern work culture, this argument was deployed fairly widely: no matter what its issues are, the modern office environment (with comfortable chairs, air conditioning and mostly interesting work) is better than the life of a medieval farmer or an industrial factory worker, so we should appreciate it.
When I published one of my earlier essays, one of the points in which was that everybody commuting to work on a 9 to 5 schedule created undue strain on all sorts of infrastructure, I got a few similar responses, too ("well, try working in a Starbucks instead of a 9-5 job and see how you like it" or words to that extent).
Thing is, all these points are valid. I wouldn't want to swap my lifestyle with that of a medieval farmer, despite that by some metrics their life might have been better than mine, or live without electricity or potable water, or even work at a coffee shop.
But that doesn't imply that I want my life to stay exactly how it is. No matter whether there are people out there whose lives are better off or worse off than mine, I always want to improve my circumstances somehow and I think it's worth contemplating how things could be made better, all the time.
In the case of work, work cultures and workplace environments, as much as I do agree office workers have it pretty good, I don't think people should treat the ability to sell most of one's waking hours to someone else as the best humanity can do. It's in fact kind of elitist to suggest that our way of life is the best one and pity those who aren't striving towards it.
In its strong form, the "not as bad as" fallacy implies that nobody can improve their lives until they have made sure everybody else is going to be better off. This kind of serves as a counterpoint to Pareto improvements, where at least one individual ends up better off without making anybody worse off.
I think, partially, using it stems from the will of the speaker to rationalise what's happening to them and why they don't want to change their own situation and examine their own circumstances. It's easy to continue doing what you're doing and not taking any risks if you've seen (or imagined) how bad it can get.
As a more extreme form of this argument, it might even be an implicit desire to not see anyone in a group become better than the group, kind of an extension of a crab mentality. A villager could be told that, sure, life in the village is tough, but the neighbouring villages have it worse, so why leave? Especially if he does make it big somewhere else, comes back and makes us all look like fools.
But, more dangerously, it can also be used as a manipulation tactic by someone who affects someone else's life and wants them to come to terms with that. Consider a boss that doesn't want to give you a raise ("well, Jimmy has worked here for a decade and never asked me for one!"). Even darker, imagine a victim of domestic abuse getting told that the problems they are facing are first-world problems and at least they still have a roof over their head. Or indeed the victim telling this to themselves as a way of self-gaslighting.
Taken to its extreme, this argument invalidates any sort of technological advancement that's attempted before every country on Earth has exactly the same quality of life. Should space exploration be (or have been) postponed until all nations have achieved Western quality of life? Or do we expect innovation in one country, no matter which side of the globe it's on, to be eventually spread around the world?
I think Stoicism is a great philosophy and a way of life and I've been trying to use it in my life too. One of Stoicism's core teachings is that the best way to be happy is wanting things that one already has and valuing them. Negative visualisation is one of the tools for that: imagining how things could be worse, partially to appreciate them more, partially to plan for the case they do become worse. When used like that, Stoicism leads one to the revelation that they could be happy here and now, without relying on anything outside of their control.
Hence, the "not as bad as" argument could also be used as a way of negative visualisation.
But a large amounts of Stoics whose writings have reached us were rich and famous. Seneca was a playwright and a statesman. Marcus Aurelius was an emperor. I have long tried to reconcile the fact that Stoicism seems to stop us from wanting anything with the fact that a large part of Stoics were of high statures.
Given that for Stoic writings to reach us, they had to have been famous in some way already, it's possible that they started using this philosophy as a way to keep the positions that they had achieved and stay where they were. However, it also could be argued that their beliefs empowered them to do what they felt was right without seeking external validation. That the recognition of their work in terms of money, fame or prestige happened as a side effect, something they didn't care about.
One of my favourite pieces of writing I reread quite often is David Heinemeier Hansson's "The Day I Became A Millionaire". Here's what I think is the best quote from it:
Barring any grand calamity, I could afford to fall off the puffy pink cloud of cash, and I’d land where I started. Back in that small 450 sq feet apartment in Copenhagen. My interests and curiosity intact. My passions as fit as ever. I traveled across a broad swath of the first world spectrum of wealth, and both ends were not only livable, but enjoyable. That was a revelation.
Note how DHH caveats this with "first world spectrum of wealth": he also credits the privileges we have, in his case, the Danish social security system, with his success.
I view Stoicism and ability to appreciate what I already have as a springboard to continuous (and continued) improvement of things within my control. It's the ability to take risks knowing that wherever you land, your life will still be pretty good. So in that respect, the "not as bad as" argument turns into "won't ever be as bad as", changing apathy into an empowering limited-downside proposition.
While appreciating privileges that we have is a good tactic for personal happiness, I also believe that the best way to respect those privileges is to use them and do things that one wouldn't have been able to without it. Otherwise, we're essentially squandering them.
And it's not like one's success helps just that person. Joanne Rowling wrote the first few chapters of Harry Potter whilst on benefits, another first world privilege. A couple of decades later, these series of books have sold in excess of 500 million of copies worldwide and spanned a film franchise that has grossed a few billion dollars. Notwithstanding the joy that the Harry Potter series has brought to the people all across the world, the tax revenue from that might well make the UK's welfare system one of the best-performing VC funds in the world.
Sure, all of humanity's problems might stem from a person's inability to sit quietly in a room alone, but so does all the progress.
Imagine if we could turn this:
The first picture is a graph of how many people enter the London Underground network every minute on a weekday. The second graph is for the weekend, except slightly altered: I normalized it so that both graphs integrate to the same value. In other words, the same amount of people go through the network in the second graph as in the first graph.
Would you rather interact with the former or the latter usage pattern?
The data geek in me is fascinated at the fact that there are clear peaks in utilization at about 8:15 (this is the graph of entrances, remember) and, in the evening, at 17:10, 17:40 and 18:10. I'll probably play with this data further, since the dataset I used (an anonymized 5% sample of journeys taken on the TfL network one week in 2009) has some more cool things in it.
The Holden Caulfield in me is infuriated at the fact that these peaks exist.
It's alarming how often society seems to hinge on people being in the same place at the same time, doing the same things. The drawbacks of this are immense: infrastructure has to be overprovisioned for any bursty load pattern and being inside of a bursty load pattern results in higher waiting times and isn't a pleasant experience for everyone involved. Hence it's important to investigate why this happens and whether this is always required.
Have you heard of TV pickups? Whenever a popular TV programme goes on a commercial break or ends, millions of people across the UK do the same things at the same time: they turn kettles on, open refrigerator doors, flush their toilets and so on. This causes a noticeable surge in utilization of, say, electric grids and the sewage system. As a result, service providers have to provision for it by trying to predict demand. This isn't just an academic exercise: in the case of electric energy, generators can't be brought online instantly and energy can't be stored cheaply.
In the case of the Underground network, there are times on some lines where trains arrive more frequently than every two minutes (pretty much as often as they can, given that the trains have to maintain a safe distance between each other and spend some time on the platform) and yet they still are packed between 8am and 9am. Any incident, however small, like someone holding up the doors, can result in a knock-on effect, delaying the whole line massively.
Why are people doing this to themselves?
The weekend was a great invention (although Henry Ford's reason for giving his employees more time off was that they'd have nothing to do and hence start buying his own, and other businesses', goods). But does the weekend really have to happen at the same time for all people?
Some of the phenomena governing people's schedules are natural. It does get dark at night and people do need light. It gets cold in the winter and people need heating. But the Earth does not care whether it's the weekday or the weekend, a Wednesday or a Saturday. And yet somehow the society has decreed that Wednesday is a serious business day and any adult roaming the streets during daytime on that day might get weird stares.
Expanding on this, do working hours have to happen at the same time either? People naturally need rest, but what they don't naturally need is to be told when exactly they can work and rest. And in some types of work, like knowledge work, being told when to work is not necessary and even can be harmful.
In professional services, in most cases, the client doesn't care when the service is being performed. The client wants a tax return to be prepared: they don't want the tax return to only be prepared between 9 and 5. The client wants their investments to be managed: the investments don't need to only be managed between 9 and 5. And so on. Fixed work hours make no sense since it's not time the client is buying, it's the result. Knowledge work isn't predicated on people having to do it at the same time or even at a given time.
The fact that everybody has to work fixed hours hails from the Industrial age assembly line thinking (in fact, the term "line manager" is still used in the UK to refer to one's boss). If one part of the assembly line is missing, the assembly line doesn't work. Hence the management has to make sure that all parts of the assembly line have finished their sandwiches and are in place for when their shift starts. The whole shift also has to get their days off synchronously, as it can't function at all after a critical mass of people has taken the day off.
This is in no way an argument for longer working hours. If a person has exhausted their working capacity for the day, what's the point of holding them in the office until a given hour unless they're in a role that requires that? Some people work better when they have a set goal and some time to achieve it, to be used at their discretion. Some people work in bursts, where the output of one day can overshadow the rest of the week. Mandating fixed hours for knowledge workers means they aren't as efficient as they can be for their employer and further suffer from the utilization peaks that they themselves cause.
Do we still need offices? Some criticize working from home as a way for employees to slack off. But if you think your people won't work unless they're watched, maybe you're hiring the wrong people. A loss of productivity from not having someone standing over their shoulder is offset by the gain in productivity from not having someone standing over their shoulder and not working in a distracting open office environment.
A benefit of offices is that they encourage communication and sharing of ideas. It's much easier to walk up to someone and ask them something, and information travels around quicker and more naturally.
On the other hand, imagine if you were a medieval scholar. They would usually work alone, with all communication with their peers done over long-form letters. Communication used to be asynchronous and there was no way the letter would be delivered as soon as it was fired off, hence there was no expectation of getting a reply in the same hour or even within the same day.
Nowadays, people are expected to respond to messages instantly, which means they have less and less uninterrupted time in which they can't be distracted.
Would you rather have a 1-hour chunk of time to do work in or 6 chunks of 10 minutes, interrupted by random phone calls, instant messenger pings and people walking up to you? The former option is much, much better if you want to do any deep work. Productivity is highly non-linear and 10 minutes of work result in better outcomes when they come after some time to ramp up. Even the anticipation that you can be interrupted can distract you and prevent you from getting into a state of flow.
Perhaps there's no need for people in the workplace to expect others to be able to instantly respond to them. In fact, slower, asynchronous communication can lead to more robust institutional memory inside of an organisation. Instead of the easy fix of tapping a colleague on the shoulder to get an answer, the worker might instead devise a solution for an issue themselves or figure it out while typing up an email, adding to the documentation and making sure fewer people have that question in the future.
Do all meetings have to happen at the same place or at the same time? Some of them do: sometimes there's no replacement for getting all stakeholders in the same room in order to come to a decision. But meetings are also a great way to waste company money, setting thousands of dollars on fire by the simple act of blocking out one hour of several people's time.
What is now a synchronous meeting (together with the flow breakage than that brings: I found that I'm more productive in a given hour if I know I don't have to go anywhere in the next hour even though the time I'm spending is the same) could be an asynchronous e-mail chain or a set of comments on the intranet that people can get to at their discretion.
There's something mesmerising about being able to watch live coverage of an event. Instant notifications of a new development are a way to gratify yourself, feel like you've done something, get a small dopamine rush from getting another nugget of information. But in reality, not much has changed and this development will likely be insignificant in the end.
In this age, people have no time to think about their reaction: everything is knee-jerk, synchronous and instantaneous. An incident happens. Minutes later, we find out there is a suspect. Minutes later, there's a witch hunt across social media for the suspect and their family. Days later, the suspect is acquitted and there's another suspect. Information, not necessarily valuable or true, nowadays travels so fast that things can easily get out of control and anyone with Internet access can join in the madness.
There are a few billion times more people than you and your brain can't process inputs from them all in real time. Hence people have to operate with abstractions. Instead of constantly receiving a stream of data that interrupts your life and ultimately doesn't add anything to it, why not go to a different abstraction level and lower the sampling rate, instead reading a weekly newsletter?
If you had an investment portfolio, would you act based on looking at its performance every hour or every day? Or would you instead be aware that all the noise from the daily developments will probably cancel itself out and turn into a clearer picture of what's happened?
A friend of mine works in a role where she needs to interact with offices in other countries that don't maintain UK bank holidays. What her employer does is increase her holiday allowance instead, making bank holidays a normal working day. I think this is amazing. The time when most people are away on holiday is the best time to get some work done in the office and the time when most people are in the office is the best time to go shopping, visit doctors, go to a museum and do all other sorts of other life admin things.
From a cultural point of view, public holidays are amazing. From a logistical point of view, they're a nightmare. If everybody is having a holiday, nobody is, and the fact that everyone is observing the holiday at the same time yet again creates usage peaks in all sorts of places.
For example, synchronous buying of presents means that retailers have to overstock their wares in run-up to major holidays, say, Christmas, and, worse even, have to offload all the Baylis and Harding soap sets at fire sale prices starting at about dinnertime on the 24th December. Hilariously, the best time to shop for Christmas presents is in January.
And most holidays are taken around these times too. People in the UK get fined and can even get prosecuted for taking their children on holiday during term time. The fine is usually less than the difference in price for airline tickets and accommodation between taking a holiday during term time and outside of term time, but that price difference is just a consequence of the difference in demand between those times. There are still planes in January, but they're... emptier. And airports aren't such an unpleasant experience.
In any case, most parents are now coerced to take holidays only outside term time, which has a knock-on effect on flight/accommodation usage and prices.
I honestly don't know how to solve most of these problems. Maybe with the rise of remote work and teleconferencing this will naturally go away, moving us to a future where nobody can have a case of the Mondays any more. Some companies are embracing parts of being asynchronous already, like Basecamp (ex-37 Signals) who list the benefits of remote work and fewer meetings in REWORK.
It's more difficult on the social side. I dream of a relationship where we agree to celebrate all holidays (Christmas, Easter, Valentine's etc.) a few days later to take advantage of the trough in the demand that comes after the peak. In addition, to be efficient, education indeed has to be synchronous: one teacher educates multiple children and any of them skipping material will result in them having to catch up, delaying the whole year. I had a chat about this with a friend once: what if education were much more granular, with children (or their parents) being able to pick and choose when their child takes a given class? Staggered school shifts, perhaps?