Sat. 4 July 2020: Movement of community.unix.com to a new server

On Saturday, 4 July, 2020 I plan to move our new site communitiy.unix.com to a different server.

The new location is currently running fine in a staging configuration and when testing, it appears faster.

The current hosting for community.unix.com is a 16GB six core VPS slice running Discourse in a docker container on Ubuntu 18.04.

I plan to move this to:

Dedicated hosting with 64 GB RAM and a 4 core (8 thread) Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz.

My server consolidation plans have changed because our dedicated hosting provider has dropped their price of the dedicated server above to the same price as the lower performing VPS it is on now. So, we will have better performance for the same money.

I have already tested Discourse running in a docker container (production mode) on the dedicated server hardware behind an Apache2 reverse proxy and it works great.

Currently, our community discourse app runs behind a nginx reverse proxy. The additional RAM and CPU power appear to easily compensate for the performance hit of changing the reverse proxy from nginxto apache2. Running the discourse docker production app behind the Apache2 reverse proxy permits us to run both the legacy site www.unix.com and community.unix.com on the same dedicated hardware with higher overall performance.

In addition, the new Discourse configuration will run configured with PostgreSQL 12, which I have tested and it is not a problem to restore a new PostgreSQL 12 configuration with a PostgreSQL 10 database dump. This is a more reliable way to upgrade to PostgreSQL 12 in my view (versus trying to upgrade from PG10 to PG12 "on the fly" as many do with mixed results and a lot of downtime) Plus, I have already done this type of upgrade a number of times in test mode, and it works well (without any problems at all).

My plan is to begin to perform this community migration:

Saturday, 4 July 2020, 5:00 AM GMT

At that time, my plan is to:

  1. Set community.unix.com to read only mode.
  2. Do a full backup of communitiy.unix.com
  3. Move that backup to the new server.
  4. Change the DNS to point to the new server.
  5. Rebuild community.unix.com on the new server.
  6. Perform the restore.

The entire process should take around 30 to 45 minutes (let's say 1 hour to be on the safe side).

So, my estimate is:

BEGIN MIGRATION: Saturday, 4 July 2020, 5:00 AM GMT
COMPLETE MIGRATION: Saturday, 4 July 2020, 6:00 AM GMT

If for any reason, the migration fails for any unforeseen reason (I don't think it will, since I have tested this fairly extensively), I will simply change the DNS back to the current site, since the migration does no effect the current configuration.

2 Likes

Update

Time Now: 12:24 AM GMT

Time to Start Server Migration: 5:00 AM GMT

When: Four hours from now.

1 Like

Migration completed.

2 Likes

Yes, confirmed.

Migration completed.

New server is faster that before because running on 8 CPU cores with more unicorn workers for Discourse app & more RAM.

Running Discourse Version 2.6.0.beta1 with PostgreSQL 12 (fully upgraded)

5 Likes

Update

  1. Increased unicorn workers to 16 (2 per thread, 4 per CPU core).
  2. Increased db_shared_buffers to 21GB (1/3 of 64GB total on server).

Seems to be running very fast now.

2 Likes

Short Outage Today: Fixed By Restarting Docker

Strange that Docker seemed to have crashed (required a restart)... have never seen this before after many months working with Docker.

1 Like

Docker locked up again. Had to restart docker again.

Try This: Reconfigure Discourse App:

UNICORN_WORKERS: 4

db_shared_buffers: "4GB"

Let's see if that changes things (just a WAG, a "wild ass guess".... since that is the only thing which I changed lately).

See also:

2 Likes

Now trying:

Discourse App configuration:

UNICORN_WORKERS: 8 # two workers per core (one per thread)

db_shared_buffers: "4GB"

Let's see if this works without Docker "locking up".

AFAIK Discourse Docker is only a problem on docker version 17.10 - 17.12.

Docker downgrade recommended from these versions until bug fixed.

You posted on Meta asking whether there is a maximum docker version and they reckon not.

I don't personally know anyone using Discourse on Docker 18.xx.

We run Discourse on Docker 18.XX. This is the standard version of Docker installed with Ubuntu 18.04 using apt.

ubuntu# docker version
Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.1
 Git commit:        2d0083d
 Built:             Fri Aug 16 14:20:06 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.09.7
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       2d0083d
  Built:            Wed Aug 14 19:41:23 2019
  OS/Arch:          linux/amd64
  Experimental:     false
1 Like

Yes, so what we're seeing is not a known bug then.

Can you show output of for docker processes running app and db

cat /proc/<your PID/limits

I suspect issues with open files limits or such.

What do you mean by lockup, docker ps show running but access to app or db is not working or crashes completely ?
You only stop / start containers in question, not the docker daemon right ?

Regards
Peasant.

When the issue happens, as I mentioned, I have to restart docker. Restarting docker is not the same as restarting a container, obviously. Here is what I mean by "restart docker":

service docker restart

The issue seems related, not to the number of UNICORN_WORKERS but the value of db_shared_buffers.

The problem has not reappeared after I reduced the db_shared_buffers parameter to 4GB, so currently, all is running well with:

UNICORN_WORKERS: 8 # two workers per core (one per thread)

db_shared_buffers: "4GB"

So, I'm planning to leave these parameters alone for now and do not plan any more analysis unless the problem arises again.

3 Likes