Theoretical "events per second" on a moderately sized server?

I’ve recently re-discovered my interest in big data analytics and got an idea to make a logging plugin that would log EVERYTHING.

Well, not literally everything but almost…

e.g.

  • All block interactions
  • Hits, kills and self-caused deaths
  • Chat messages
  • Player movement
  • etc. etc. etc.

How many of these events per second would you reckon would be generated on a server with ~50 people actively doing things?

I’m trying to pick the right database for the job…

Tested so far:

  • MongoDB - ~6400 writes per second (synthetic workload) I’d expect about half that with real data
  • MySQL non-indexed - Abysmal ~250 inserts per second (synthetic workload)

Keep in mind these numbers are from a synthetic predictable workload on unoptimized default installs of the respective DB engines.

Any idea what database would be ideal for this?

I mean personally I would recommend like elasticsearch, but maybe that’s just because I like how tunable it is…

At work we’ve gotten I believe up to about 10,000 full text documents being indexed per second under full load, without too much tuning… Of course there is the downside that elasticsearch by itself without being clustered and/or tuned isn’t exactly amazing.

out of curiosity though, would it make more sense to have this kind of information stored into a time series database like influx as metrics or something?

I don’t think Elasticsearch is a good fit for this kind of use case… Influx could work pretty well if it’s fast enough. Gonna try that next. I’m not expecting Influx to be a good fit tho, HTTP is slow.

Yeah but if you use a good InfluxDB client should keep the connections alive, and if it’s local…

Another thing that we use to combat the scale at work is by having every event going to a message queue and then just having multiple systems put them in separate DB instances and then merging them later when there is downtime or similar. This by far has gotten us some of the best results.

Also just out of curiosity why would elasticsearch not work in this case?

Regardless, I am curious to know what you end up going with and how well it works - it’s always useful to learn more about this

From experience, Elasticsearch has always been a huge pain in the ass but that was a while ago so it might be different now…

I’ve tested Influx with the same setup as Mongo and MySQL and only got around 130 writes per second with keepalive.

The Mongo and MySQL tests both did 1M writes of sequential data (val-1, val-2…val-999 etc). Influx test crashed after around 100,000. I’ve tried it like 5 times and always crashed after around that number of writes…

Next up for testing is Cassandra which I’ve been told is one of the fastest… We’ll see about that.

I feel like the ideal solution would be something that holds the data in memory first and then defers the writes to convenient times in larger chunks reducing the IOPS…

Maybe some combo of a message queue and a separate “conenctor” that will read from the queue and write to persistent DB at convenient times… Welp, I guess it’s time to dust off my lacking Golang experience :grin:

I feel like the ideal solution would be something that holds the data in memory first and then defers the writes to convenient times in larger chunks reducing the IOPS…

That’s actually why I recommended elastic search xD
I find that it does very well when tuned even a little bit… Although you are definitely correct, I think it is more annoying to set up than any of the other systems mentioned here.

I’m a little floored that influxdb had such slow writes… But at the same time, I suppose I’ve not personally stress tested that before so maybe at work we tuned the hell out of it behind the scenes.

But yeah -shrug-

Are you planning on running this as like an external service, or just something local to play around with?

Also good ol Go xD

Go is friggin awesome. You get the performance of C++ without the… well… C++ :grin:

I might be doing something wrong with Influx, I’ve never used it before, but the default way of talking to it is HTTP and HTTP has so much pointless overhead for this I’m not actually that surprised it’s slow…

No plans to “publish” this in any way so far but if I end up with something useful it will definitely find its way to GitLab…

Speaking of which, If I was to publish this, the simpler the DB backend the better… “Download this (Mongo, MySQL, Druid, Whatever) exe, install it and voilà it works” is definitely better than “you need Elasticsearch so umm good luck I guess here’s a book”…

Yeah fair enough, that’s kind of why I was curious on what you were planning on doing with it. Cuz if you’re hosting the data collection server / database yourself for anybody use, I think it would be worth the annoyance of setting up elasticsearch but if you wanted to be very user-friendly then maybe not xD.

Well… “User-friendly” :grin:

User friendly as in have root access and some very basic understanding of how to install shit on your distro at least… Or a windows machine and the ability to press Enter so I guess it should be pretty user friendly…

@d4rkfly3r I’ve decided to go with Mongo in the end. I’ve managed to optimize it down to about 0.7ms per write which should be more than enough for direct writes without buffering…