-
-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A monitor bot #124
Comments
BTW, my script is using Python. Maybe Rust/Go/C++ would be a better choice, for monitoring bot may be on a raspberry pi with limited resources. |
Hi! This is all worth talking about as what you're talking about (backend automations) are inherent limitations of cState, logically some of which are very disappointing. I must ask you to clarify on what you mean by this, first of all:
Now let's think about in what contexts cState may be used. Recently, with the help of a friend, I added a Dockerfile to the repository, as that was an option requested long ago by somebody on this same issue tracker. From what I know, this means that cState can be hosted 'dynamically' like WordPress or whatever. However, when I initially made the project in 2017-2018, it was supposed to work (best) with one platform and one platform only: Netlify. Or something like it that supported the Golang Hugo SSG. Netify builds the site with the aforementioned Hugo engine which I had fallen in love with at the time (because doing it locally every time would have been super annoying) and Netlify hosting is a glorified CDN. The advantages and disadvantages of static hosting are:
For a small startup or a hobby project there is no such urgency as for big services. And if there is such urgency, why would you host your own status page? On your own infrastructure? That's what happened with AWS S3 a while back. A big headache indeed. Because remember, Netlify — with hosting — is basically free. So is cState. Even with Cachet — which I believe has more features — you need to host it yourself which costs some money. This does not mean that cState cannot be automated and in fact I wish that it was, to some extent. But I'm a front-end web developer, in fact I am more of a designer these days. I am not knowledgeable in Python, nor C, C++, Rust, Go, etc. You get the point. Of course I would love if somebody made integrations -- or even became a maintainer and worked officially on this same org (under a repo like I want any features that are added officially to be done so with care, so that the project stays simple for those who simply want a glorified informational feed. One last thing to mention is that 3rd party integrations are always possible -- just look at your own, it looks great! -- and I usually tell people who want live updates -- what I call monitoring -- to use Custom Tabs, Custom HTML, because those are the easiest options available right now. |
OK, sorry for my ambiguity. Firstly let me explain all these possible features:
Next, about your what-so-called "inherent limitations". I don't think it should be called "limitations". Actually it's an important point I love cState. The cState job is only for showing people a beautiful website, and it did its own job perfectly. The website is truly beautiful, and free, and stable. The other things should be handled by other services, they don't belong to the frontend. That's why a monitor bot is needed. It may not be convenient for cState to add a "subscription" feature like GitHub status page, unless you do it in a separate website or use Custom Tabs. My service is like between a "hobby project" and a "big service". It's got 2,000 UV and 12,000 PV per day. So, there partly is such urgency. With cState, Netlify and GitHub host my status page and I can host my "own infrastructure" backend at home. Therefore there won't be something like the AWS S3 accident , because the downtime of my raspberry pi have no relation with GitHub API or Netlify or my website. These 3 services(status page, monitor bot, my main website service) are mutually independent. I agree that any official feature should be added with extra care, that's why I don't think Python can do this backend job. I'm not professional either. Maybe we need more professional guy to build a more carefully designed bot. |
BTW, I've just noticed that cState can be subscribed via RSS, this also increases the importance of quick website updates. |
I mean, in your case, this assumes that:
I'm just saying I can't do anything here to help, somebody else has to take initiative. If you know who would, my contacts are on my profile. |
The downtime of the monitor bot usually has no relation with the downtime of the main service. |
I'll keep this open for a bit, but ultimately it's out of the scope of what I'm able to do |
Here's my take: I believe having a status page go into investigating mode automatically is a good idea, as it allows the devops guys to get straight into resolving the issue. With cState being primarily server-less, it can run on another platform which (unless you're Cloudflare) shouldn't be affected by your outage. My personal opinion on an implementation of this, would be to extend an existing systems monitoring platform - Prometheus would be my choice. There are already a tonne of scrapers out there that can monitor not only the service, but the networks & servers powering them. For a simple website this isn't a big deal, but when you're maintaining network infrastructure, you want any outage to any part of it to be reported. So using a platform like Prometheus solves the issue of data collection (polling one part of the site usually doesn't tell the whole story); we need to get the data to cState. Prometheus has exporters that are fed information about outages, a light-weight application written in, let's say, go could be used for publishing this data via the cState content git repo. https://github.com/go-git/go-git, for example would make this quite simple to do. This also means the cState exporter doesn't need to have anything to do with other alert methods (webhooks, for example) - they can be handled by other exporters, AlertManager being the most common. Another advantage to this approach is the rules you can set within Prometheus to set the severity of an outage in cState, for example, if one server in your stack is reporting a higher network latency, you may want to set the site to degraded for that specific region, and then, if you have a really advanced setup, you could set a critical alert if some piece of your network stack goes for a jolly, let's say a switch fails. It will of course be important that your Prometheus server, scrapers and cState Exporter are away from the infrastructure you're monitoring, and this is quite an advanced way of doing it, which may make it a little harder for an admin of a small website to configure, so there's likely still place for a simple "hey if this server doesn't give me a 200, flag it." - but you'll need to have some form of delay (let's say, it fails to respond 3 times) before updating the status page, or a quick reload of NGINX turns into a "critical systems outage" So finally, I completely agree that outage reporting should be automated in cState, but it'll take a lot of planning to work out the best way to do it. edit: we'd also have to think of a way to tell whatever's hosting cState to rebuild the static content, it's all good updating the git repo but it's not good if the site never updates. |
If you host with Netlify or GitLab Pages, that's done for you automatically (it's under CI/CD). You could roll your own infra. Either option is not 100% bulletproof. Well, tbh nothing is. |
Is your feature request related to a problem? Please describe.
cState is only a frontend, a backend monitor bot should be provided.
Describe the solution you'd like
I've wrote a very simple bot myself at https://github.com/thuhole/status-probe, with these features:
It would also be a good idea to add some other features, such as webhook, email notification, custom messages, more types of services etc... I just think that such bot should be a cState official feature. So users don't need to write scripts themselves.
Describe alternatives you've considered
Manually publish . However manually publishing disruptions usually has a delay. You cannot ask a human to stay up 24/7.
Additional context
The text was updated successfully, but these errors were encountered: