Update on vB3 Migration to Discourse - Issues and Status of BBCode Transformations

Neo · March 30, 2020, 11:19pm

We "completed" the migration of this vB3 site to Discourse a number of days ago. However, deeper testing by @Scrutinizer and @MadeInGermany revealed that a lot of text was mangled in the migration. We traced these bugs to two issues:

A minor bug in the Ruby vbulletin.rb migration script which transformed " \n" in code fragments to hard breaks; and
A major bug in the recommended migration Ruby gem ruby-bbcode-to-md that mangled all code fragments with left square brackets. (strips them out, completely!)

These are serious migration bugs which affect the integrity of all of the hard work our member have contributed over the years and must be corrected.

I posted a bug report over at meta-discourse on the major bug in the Ruby gem ruby-bbcode-to-md but the maintainer of that repo shut me down, deleted my bug reports, washed his hands of a repo with his name on it, and acted very unprofessional, even though the bug was easily confirmed. This was not encouraging and brought my spirits down a bit at the time.

@Scrutinizer then came to the rescue (thank you!), took a step back, and did a formal analysis of the Ruby preprocessing script and the various bbcode transformations including the:

Ruby preprocessing method in the vbulletin.rb migration script
Discourse builtin bbcode support
Ruby gem ruby-bbcode-to-md Discourse plugin

At the same time, I was working on:

Debugging the ruby-bbcode-to-md discourse plugin
New PHP script to reprocess the pagetext from the vB3 MySQL DB and replace the text mangled by the ruby-bbcode-to-md gem in the Discourse Postgres DB.

@Scrutinizer created a spreadsheet and did the analysis and determined that the mangler gem ruby-bbcode-to-md was not required.

In addition, @Scrutinizer suggested we install the discourse-bbcode plugin and test it.

With good preliminary results from discourse-bbcode , I started to learn how to modify a Discourse plugin and found that the discourse-bbcode plugin was not difficult to modify (straight forward javascript), so I set up a development environment on my desktop as follows:

Forked the discourse-bbcode on GitHub to neo-discourse-bbcode
Cloned the neo-discourse-bbcode to my desktop
Modified neo-discourse-bbcode using Visual Studio Code
Pushed changes to my newly minted forked neo-discourse-bbcode repo on GitHub
Rebuilt and tested our staging Discourse apps using the modified neo-discourse-bbcode repo in the app.yml build file.

From this test setup, was able to add new "preliminary" bbcode tags, for example I created two new tags which correspond (roughly) to two of our legacy tags:

ICODE tag
MOD tag

This forked repo is still a work-in-progress and I am still learning to modify and test and try to add other tags. Right now, I'm having some issues with preformatted elements, but that's a discussion for another day.

GitHub - unixneo/neo-discourse-bbcode: vBulletin BBCode plugin

Although we have made progress, @Scrutinzer also discovered that the Markdown and CODE bbcode tags used by Discourse (builtin) do not permit BBCode in the fenced code blocks.

This means that our technique of using color to highlight sections of code in blocks when helping others currently has no solution, but we are working on this. Currently, in the migration script this bbcode is stripped out during migration. However, we want to find a way to preserve this capability and feature if possible.

However, there are gremlins around the corner:

Fixing this will more-than-likely require us to disable the Discourse builtin CODE tag (so far, I have not been about to override this in a plugin); and there is just about zero chance the busy folks at meta discourse will support this or even answer my question in a helpful way if I ask how to do this (disable or override their builtin code tag). Historical discussions from the meta discourse team shows a near religious passion for markdown and any deviation from that they perceive as "right and wrong" is not well received. Plus, this is a migration issue, and they are focused on the future, not the past (understandably). I see zero chance asking about this will be well received over there.

So the current options seem to be:

Hack the Discourse code base to disable their builtin code tag and write a plugin to do this (not certain how to do this, really and it will not be persistent when Discourse upgrades, so this seems not a reasonable possibility at the moment).
Ask "how to do this on meta" and get beat up severely by the meta team, who will surely oppose this and tell us to "get over it" and "move on".
Strip out all bbcode in block code tags during migration (the current "solution").
Look at highlight.js and see if we can hack that to get bbcode to work (just a wild idea at the moment).
Something yet to appear in the fog of all this.

I don't consider removing color indicators from our code blocks and stripping out bbcode from these fenced code blocks a major issue; but there are some who will consider it a big deal, possibly. We will be "losing" this well-liked feature if we strip it out.

So, we are still looking into this.

However, I am not planning to ask on meta, because that question will surely, based on historical discussions on code tags there and my "not good" experiences with two migration bug reports over there, will not turn out good, this I am sure. Migration issues are not well received over there (they are working on building for the future, and this is understandable, honestly) and we are pretty much "own our own" on this.

That is the latest

Neo · April 1, 2020, 11:14pm

Today, armed with a new migration script vbulletin_neo7.rb I will start a migration from scratch on the staging server for the purposes of getting raw preprocessed posts from the postgres DB and uploading them to the community site; but first I will do this on the staging site, as follow:

Start a new migration from scratch using vbulletin_neo7.rb . This will take a few days.
Test some problematic posts and see if they migrated correctly ( tested this yesterday on a small scale, and it looked fine), if so:
Dump the raw posts from the postgres DB created above, along with the mapping table between the vb posts and the discourse posts.
Restore the staging server with the current community snapshot.
Move the raw from the new migration to the restored staging server DB.
Test.

Yesterday, I tested the migration script vbulletin_neo7.rb and it worked OK

@Scrutinizer is also working on some other bbcode migration enhancements which we will apply when he is ready to test (he has a full time job and family demands so no hurry or worry).

I want to create a new baseline without the broken ruby-bbcode-to-md plugin which our new vbulletin_neo7.rb script and new custom bbcode plugin, neo-discourse-bbcode which has a solid new ICODE bbcode tag working.

Neo · April 2, 2020, 9:50pm

Update:

Change in direction (too slow to keep doing the migration over and over).

Create / write a Ruby script (done):

Retrieve the mappings from the vB posts to the Discourse posts stored the Discourse postgres DB.
Use these postid-to-postid mappings to grab the original vB post text from each vB post in the original mysql DB.
Preprocess the vB post text
Postprocess the vB post text
Update the raw post in the Discourse DB
Test and redo.

This script above processes about a million posts in 45 minutes (much faster) and when happy with the results can rebake the raw posts into the cooked posts. Rebaking 1M posts takes about 16+ hours, so avoiding this when possible.

Ran this yesterday and found that all the bugs posted my @MadeInGermany before (mangled code, missing left square brackets) and the hard line break error reported by @Scrutinzer (where \n in code fragments were converted to hard line breaks) were fixed.

However, still more gremlins to slay, working on:

Fixing missing emoji in the preprocessing. In particular the thumps up emoji that Ravinder loves to use :b: converts to :+1: . DONE
Fixing a bug in attachments and other images. DONE

However, the main reported gremlins in code fragments appear to be fixed. Now working on other missing transformations (missing emoji, images, etc).

Making progress... slowly but surely.

All work currently done on test / staging server only.

Neo · April 3, 2020, 12:05am

Also, I am finding, by trial and error, that even before preprocessing with the Ruby migration scripts, there are some transformations which are easily done on the copy of the vB3 MySQL DB dump; for example:

UPDATE post SET  pagetext= LOWER(REGEXP_REPLACE(pagetext,'\\[ATTACH\\](.*)\\[\\/ATTACH\\]', 'https://www.unix.com/attachment.php?attachmentid=\\1'));

There was some bug in the Ruby attachment preprocessing and some attachment ids were lost, so instead of wasting time trying to find the bug in the Ruby preprocessing routine, it was easier to do the regex search and replace in the staged copy of the DB dump.

All these attachment images are automatically downloaded to the new discourse forum over time as well.

Note: Not being an expert in MariaDB REGEX_REPLACE , started with (.*) and worked my way up, and the matches only worked with the double backslashes (escapes) , would not match with single backslashes . None of the examples on the net worked (most showed no backslashes or one backslash only) in this REGEX_REPLACE expression; but I could get it to work, building it from .* up, step-by-step.

Update: This change is done and confirmed working on the staging server.

Neo · April 4, 2020, 1:52am

According to a quick check 35 tables (posts with tables) has been stripped or skipped from the migration:

MariaDB [vb3]> select count(postid) from post where pagetext like '%%';
+---------------+
| count(postid) |
+---------------+
|            35 |
+---------------+
1 row in set (5.559 sec)

MariaDB [vb3]> exit
Bye
root@discourse1-app:/shared/neo/bin# /shared/neo/bin/pg
psql (10.12 (Debian 10.12-1.pgdg100+1))
Type "help" for help.

discourse=> select count(id) from posts where raw like '%%';
 count 
-------
     0
(1 row)

We can easily write code to convert these tables to markdown; but since the posts were stripped from the migration, and this would require a totally fresh migration from the start, I am inclined, as this time, to just drop these 35 posts from the new site.

We could add them at a later time, manually, as doing this manually for 35 posts would take a few hours, but redoing the migration will take a week and be painful. It would take a few hours to write and test the script, to make sure it works anyway.

So, my inclination to not worry about losing 35 posts with TABLE tags at this time (and manually add them back in the future).

Neo · April 5, 2020, 3:16am

Update (from one hour ago):

Updated the new community sites with latest posts, new users, likes, etc. from legacy site.

Ran an early preprocessing script against the legacy DB.

@Scrutinizer is testing a more refined version of preprocessing which will do even more migration magic. When he is ready, we will run his preprocessing script again the legacy DB and see how it looks.

Thanks for your patience.

We have already fixed the two bugs that we found in the initial launch; but are working to refine more custom bbcode issues.

Neo · April 5, 2020, 10:39am

Well, as a update, this earlier MariaDB REGEX was flawed, my bad.

UPDATE post SET  pagetext= LOWER(REGEXP_REPLACE(pagetext,'\\[ATTACH\\](.*)\\[\\/ATTACH\\]', 'https://www.unix.com/attachment.php?attachmentid=\\1'));

Should have been

UPDATE post SET  pagetext= REGEXP_REPLACE(pagetext,'\\[ATTACH\\](.*?)\\[\\/ATTACH\\]', 'https://www.unix.com/attachment.php?attachmentid=\\1');

Using the LOWER directive not only worked on the REGEX , but on all text in the post, moving all text to lower case (unexpectedly). That will be fixed after 12 hours of rebaking the 1M posts.

In addition, my original REGEX was too greedy and I had to add the ? to make it less greedy. Somehow, I missed that during initial testing.

Thanks to @Peasant for catching that bug quickly today.

We are "mind warped" writing and rewriting all this (new to us) Ruby migration code. The extra eyes, fresh perspectives, and bug hunting all are much appreciated and valuable contributions to the final migration success (whenever that happens, LOL)

Neo · April 5, 2020, 10:43am

For those who have never seen the inside of a Discourse app, here are the canned raking tasks:

# rake --tasks
rake about                                                             # List versions of all Rails frameworks and the environment
rake add_topic_to_quotes                                               # Add the topic to quotes
rake admin:create                                                      # Creates a forum administrator
rake admin:invite                                               # invite an admin to this discourse instance
rake api_key:create_master[description]                                # generate a master api key with given description
rake app:template                                                      # Applies the template supplied by LOCATION=(/path/to/template) or URL
rake app:update                                                        # Update configs and some other initially generated files (or use just update:configs or update:bin)
rake assets:clean[keep]                                                # Remove old compiled assets
rake assets:clobber                                                    # Remove compiled assets
rake assets:environment                                                # Load asset compile environment
rake assets:precompile                                                 # Compile all the assets named in config.assets.precompile
rake assets:prestage                                                   # pre-stage assets on cdn
rake autospec                                                          # Run all specs automatically as needed
rake avatars:clean                                                     # Clean up all avatar thumbnails (use this when the thumbnail algorithm changes)
rake avatars:refresh                                                   # Refresh all avatars (download missing gravatars, refresh system)
rake bookmarks:sync_to_table[sync_limit]                               # migrates old PostAction bookmarks to the new Bookmark model & table
rake build:stamp                                                       # stamp the current build with the git hash placed in version.rb
rake build_test_topic                                                  # create pushstate/replacestate test topic
rake cache_digests:dependencies                                        # Lookup first-level dependencies for TEMPLATE (like messages/show or comments/_comment.html)
rake cache_digests:nested_dependencies                                 # Lookup nested dependencies for TEMPLATE (like messages/show or comments/_comment.html)
rake categories:list                                                   # Output a list of categories
rake db:create                                                         # Creates the database from DATABASE_URL or config/database.yml for the current RAILS_ENV (use db:create:al...
rake db:drop                                                           # Drops the database from DATABASE_URL or config/database.yml for the current RAILS_ENV (use db:drop:all to...
rake db:environment:set                                                # Set the environment value for the database
rake db:fixtures:load                                                  # Loads fixtures into the current environment's database
rake db:migrate:status                                                 # Display status of migrations
rake db:prepare                                                        # Runs setup if database does not exist, or runs migrations if it does
rake db:rebuild_indexes                                                # Rebuild indexes
rake db:schema:cache:clear                                             # Clears a db/schema_cache.yml file
rake db:schema:cache:dump                                              # Creates a db/schema_cache.yml file
rake db:schema:dump                                                    # Creates a db/schema.rb file that is portable against any DB supported by Active Record
rake db:schema:load                                                    # Loads a schema.rb file into the database
rake db:seed                                                           # Loads the seed data from db/seeds.rb
rake db:seed:replant                                                   # Truncates tables of each database for current environment and loads the seeds
rake db:seed_fu                                                        # Loads seed data for the current environment
rake db:setup                                                          # Creates the database, loads the schema, and initializes with the seed data (use db:reset to also drop the...
rake db:stats                                                          # Statistics about database
rake db:structure:load                                                 # Recreates the databases from the structure.sql file
rake db:version                                                        # Retrieves the current schema version number
rake destroy:categories                                                # Destroy a comma separated list of category ids
rake destroy:groups                                                    # Destroy all groups
rake destroy:private_messages                                          # Remove all private messages
rake destroy:stats                                                     # Destroy site stats
rake destroy:topics[category,parent_category]                          # Remove all topics in a category
rake destroy:topics_all_categories                                     # Remove all topics in all categories
rake destroy:users                                                     # Destroy all non-admin users
rake docker:test                                                       # Run all tests (JS and code in a standalone environment)
rake emails:import                                                     # use this task to import a mailbox into Disourse
rake emails:test                                                # Check if SMTP connection is successful and send test message
rake emoji:test                                                        # test the emoji generation script
rake emoji:update                                                      # update emoji images
rake enqueue_digest_emails                                             # This task is called by the Heroku scheduler add-on
rake export:categories[category_ids]                                   # Export all the categories
rake export:category_structure[include_group_users,file_name]          # Export only the structure of all categories
rake i18n:check[locale]                                                # Checks locale files for errors
rake i18n:reseed[locale]                                               # Update seeded topics and categories with latest translations
rake import:file[file_name]                                            # Import existing exported file
rake incoming_emails:truncate_long                                     # removes attachments and truncates long raw message
rake integration:create_fixtures                                       # Creates the integration fixtures
rake log:clear                                                         # Truncates all/specified *.log files in log/ to zero bytes (specify which logs with LOGS=test,development)
rake maxminddb:get                                                     # downloads MaxMind's GeoLite2-City database
rake middleware                                                        # Prints out your Rack middleware stack
rake multisite:generate:config                                         # generate multisite config file (if missing)
rake multisite:migrate                                                 # migrate all sites in tier
rake multisite:rollback                                                # rollback migrations for all sites in tier
rake plugin:install[repo]                                              # install plugin
rake plugin:install_all_gems                                           # install all plugin gems
rake plugin:install_all_official                                       # install all official plugins (use GIT_WRITE=1 to pull with write access)
rake plugin:install_gems[plugin]                                       # install plugin gems
rake plugin:migrate:down[plugin]                                       # run all migrations of a plugin
rake plugin:qunit[plugin,timeout]                                      # run plugin qunit tests
rake plugin:spec[plugin]                                               # run plugin specs
rake plugin:update[plugin]                                             # update a plugin
rake plugin:update_all                                                 # update all plugins
rake poll:migrate_old_polls                                            # Migrate old polls to new syntax
rake posts:delete_all_likes                                            # Delete all likes
rake posts:delete_word[find,type,ignore_case]                          # Delete occurrence of a word/string
rake posts:fix_letter_avatars                                          # Rebake all posts with a quote using a letter_avatar
rake posts:inline_uploads                                              # Coverts full upload URLs in `Post#raw` to short upload url
rake posts:invalidate_broken_images                                    # invalidate broken images
rake posts:missing_uploads                                             # Finds missing post upload records from cooked HTML content
rake posts:normalize_code                                              # normalize all markdown so <pre><code> is not used and instead backticks
rake posts:rebake                                                      # Update each post with latest markdown
rake posts:rebake_match[pattern,type,delay]                            # Rebake all posts matching string/regex and optionally delay the loop
rake posts:recover_uploads_from_index                                  # Attempts to recover missing uploads from an index file
rake posts:refresh_emails[topic_id]                                    # Refreshes each post that was received via email
rake posts:refresh_oneboxes                                            # Update each post with latest markdown and refresh oneboxes
rake posts:remap[find,replace,type,ignore_case]                        # Remap all posts matching specific string
rake posts:reorder_posts[topic_id]                                     # Reorders all posts based on their creation_date
rake qunit:test[timeout,qunit_path]                                    # Runs the qunit test suite
rake release_note:generate[from,to]                                    # generate a release note from the important commits
rake restart                                                           # Restart app by touching tmp/restart.txt
rake scheduler:run_all                                                 # run every task the scheduler knows about in that order, use only for debugging
rake secret                                                            # Generate a cryptographically secure secret key (this is typically used to generate a secret for cookie se...
rake site_settings:export                                              # Exports site settings
rake site_settings:import                                              # Imports site settings
rake smoke:test                                                        # run chrome headless smoke tests on current build
rake stats                                                             # Report code statistics (KLOCs, etc) from the application or engine
rake themes:install                                                    # Install themes & theme components
rake time:zones[country_or_offset]                                     # List all time zones, list by two-letter country code (`rails time:zones[US]`), or list by UTC offset (`ra...
rake tmp:clear                                                         # Clear cache, socket and screenshot files from tmp/ (narrow w/ tmp:cache:clear, tmp:sockets:clear, tmp:scr...
rake tmp:create                                                        # Creates tmp directories for cache, sockets, and pids
rake user_actions:rebuild                                              # rebuild the user_actions table
rake users:anonymize_all                                               # Anonymize all users except staff
rake users:change_post_ownership[old_username,new_username,archetype]  # Change topic/post ownership of all the topics/posts by a specific user (without creating new revision)
rake users:disable_2fa[username]                                       # Disable 2FA for user with the given username
rake users:list_recent_staff                                           # List all users which have been staff in the last month
rake users:merge[source_username,target_username]                      # Merge the source user into the target user
rake users:recalculate_post_counts                                     # Recalculate post and topic counts in user stats
rake users:rename[old_username,new_username]                           # Rename a user
rake users:update_posts[old_username,current_username]                 # Update username in quotes and mentions
rake yarn:install                                                      # Install all JavaScript dependencies as specified via Yarn
rake zeitwerk:check                                                    # Checks project structure for Zeitwerk compatibility

Neo · April 9, 2020, 8:55am

It has been a few days since my last status update, so here is a new, quick one:

@Scrutiziner and @Neo continue to work on the migration and are getting closer. The migration script provided OOTB by Discourse (the various BBCODE converters) mangled a lot of text; and @Scrutinzer has been leading the effort to get the various bbcode conversions as error free as practical. So far, so good.

Today, after more discussions with @Scrutinzer, I modified a Discourse theme component and added four new editor / composer buttons:

This was my first Discourse theme component modification, and frankly speaking, it was really easy (orders of magnitude easier, and infinitely faster debugging, than Discourse plugin development and testing).

That modified theme component is available on GitHub as md-composer-extras-neo

https://github.com/unixneo/md-composer-extras-neo

Frankly, we don't want to get too much into editor / composer button modifications until people use it more; so this will probably be the only changes to the composer before going live (sooner than later). We can look for better icons on FontAwesome and get input from the people who matter the most, all of you!

@Scruitizer informs that he is getting closer to have his new Ruby preprocessing script ready for testing against the database, and he says he has been having a lot of fun with Ruby as well!

More later.....

Neo · April 12, 2020, 11:31pm

Another update:

We continue to make progress in the complex mess of migration the bbcode from the old forum to the new ones:

@Scrutinzer wrote some nice code which strips all bbcode from our old code tags (like color, fonts, etc) because these will not work in markdown in the new forums.
After more testing, it became clear also that old bbcode tags like color, which look good in one theme, look terrible in other themes. So, I have decided to strip out the vast majority of these legacy bbcode tags in the migration everwhere (COLOR, SIZE, FONT, etc). This insures that themes in the new forums are not constrained by low value color, fonts and size types of bbcode.
In addition, inline code tags need an extra space before and after when the migrate to MD because often users do not put spaces and this causes markdown (MD) errors.

On the system admin side:

I found out that Docker on MacOS does not support unix sockets being shared outside the docker container. I found this out after spending two days trying to set up a new test configuration which decouples the Discourse app in the container with external ports. However, it works fine on Linux. This means that I will migrate all Discourse apps (staging and production) from a single standalone docker container to a two container solution, where the postgres db data is in one container and the web app is an other container.
In addition, the web app container will no longer expose TCP/IP web sockets directly but will only expose a unix socket. This means we can decouple the web app from the web server; and so, for example, we can completely rebuild the app in a new docker web container, and expose a different unix socket.
On nginx, this means we can just switch between docker instances with a simple symbolic link change outside the container. This is very powerful. I have tested it and it works flawlessly.
On apache2, reverse proxy symbolic links to unix sockets in the docker container do not work (will not connect); only direct links to the unix socket works in the proxy pass configuration to a shared docker unix socket, so this requires an apache server restart.
nginx does not require a restart because the symlink works.
Yesterday, I got both apache2 and nginx working in reverse proxy mode to a unix socket; but ran into some minor issues with SSL; which is the next layer of testing I need to do. It worked flawlessly on http but I ran into some small issues on https (SSL).
Also, in quick testing, I found that ngnix was about five percent faster than apache2, but that was on two different servers with very different configurations and traffic, so this comparison is not (yet) relevant nor valid

So where are we now?

Still testing various preprocessing code routines against the DB, looking for anomalies. We are not hoping for 100% perfect, but we do want to keep all the code and solutions in tact for sure (sans color, font size, fonts, etc).
Soon, I will work on getting SSL to work on the "two container with web server reversed proxy to an exposed docker unix socket" (TTWSRPEDUS, LOL) solution on a staging server.

That's it for now.

Neo · April 14, 2020, 4:48am

Update:

This migration is moving along. @Scrutinizer has been a great help with debugging the bbcode migration, writing Ruby methods to preprocess various bbcode situations which arise in the conversion to markdown.

For examples:

There was some newlines being added to code blocks, so we added some post-processing REGEX search and replace to tidy these up. This was a purely cosmetic change but @Scrutinizer is more annoyed by these cosmetic details than me, and so he wrote some Ruby code to fix it. We are very fortunately to have @Scrutinizer working with me on this.

The same is true for some "bbcode abuse" where in the past over the years, some people copy-and-pasted some bbcode into the forum or others just loved bbcode so much the embedded bbcode everywhere, sometimes nesting bbcode is strange ways. We have also slayed most of those dragons.

We are getting very close. We cannot promise 100% of every possible combination of bbcode-mangles in the vB3 forum will be perfect, but it will be very good, a few orders of magnitude from the initial release, hands down.

Currently I am rebaking all the post again on the staging server. That is a process which takes 12 to 14 hours. For those who may not be familiar with this, here is a short summary:

The vB3 forum (indeed most, if not all, LAMP-based forums) process(es) the pagetext in the database on the fly (when the page is summoned by the client, e.g. the web browser).

However, Discourse stores the pagetext as "raw" and then it cooks the raw into HTML to be rendered. This of course makes the site faster since the code is already rendered and stored in the DB "cooked".

The downside to this, of course, is that it takes longer to "cook all this" during migration testing (reprocessing the raw for bbcode mangling); but lucky for us, after migration is done, it's done.

BTW, this is the same way I serve our forumman pages. Man pages are also cooked and the cooked pages are stored in the DB to make them render faster, so this technique is nothing new.

OBTW, those man pages will stay here in the legacy vB3 forums; until we decide if and/or when to write a plugin to port these to discourse.

That's it for now.

Neo · April 15, 2020, 2:04am

Update:

Long baking (staging server, discourse1) done after close to 15 hours:

It is much improved, but @Scrutiziner, with his eagle eye for details, has plans for more refinements to make it even better; as there are still some chars "lost in migration" outside of the code tag fences.

Neo · April 15, 2020, 10:27pm

Update:

Have made some core changes and am currently running both our staging server and our production server in "two container" mode; where the database and the web app are in two separate docker containers:

My next plan is to set up ngnix as a reverse proxy on the staging server to a unix domain socket in the docker container, and decouple the network (TCP/IP) from the web app in the container.

This will permit us to completely rebuild the app, add new plugins, add custom code, etc. to the web app with almost zero down time because each container will have it's one unique shared (persistant) directory and we can symlink from outside the container to inside.

This means, for example, on nginx we can just change symlinks to move from one live container to the other.

OBTW, on apache (which we are not using in this setup), apache will not let us set up the proxy configuration with a symlink, so a restart of the web app is required. It's not a big deal; so if you are running apache and want to integrate these container-based web apps into your virtual host configurations, it is not a problem at all.

I have been doing these changes in a controlled, step-by-step manner, since these are "breaking changes". However, after all done. The site will be much more robust when we are done.

Honestly, I think the "two container" solution should be standard OOTB and it not "an advanced configuration" like Discourse meta says. It's actually straight forward and the best way to go and the setup is actually straight forward (two containers). The reverse proxy server to a unix domain socket is a bit more "advanced" so I do understand why folks call this "advanced"

Neo · April 16, 2020, 2:01am

Update:

After man days on this one issue, I have got the "two container" solution with nginx as a front end reverse proxy to a unix domain socket to work on the staging server.

There is a setting which is NOT in the discourse admin UI, which I had to set from the rails console:

cd /var/discourse
./launcher enter socket-only
rails c
SiteSetting.force_https = true

This sites setting does not exist in the site setting DB table until you run this command above.

Only then does it appear in the DB:

postgres=# \c discourse
You are now connected to database "discourse" as user "postgres".
discourse=# select * from site_settings where name like '%http%';
 id |    name     | data_type | value |         created_at         |         updated_at         
----+-------------+-----------+-------+----------------------------+----------------------------
 79 | force_https |         5 | t     | 2020-04-16 05:51:13.165124 | 2020-04-16 05:51:13.165124
(1 row)

discourse=# \q

Even then, this setting does not appear in the admin UI.

After regarding the app with:

./launcher restart socket-only

it worked..... finally !!

Neo · April 17, 2020, 4:59am

Update:

Staging server is running the "two container, nginx reverse proxy to unix socket" configuration.
Future production server is running the "two container without nginx reverse proxy" configuration.
Plan to switch future production server to the "two container, nginx reverse proxy to unix socket" configuration tomorrow or Sunday (site will go down for a short time).
@Scrutinizer continues to make heroic progress to escape special characters outside code blocks and tags which interfere with the markdown process on discourse.

Making progress......

Neo · April 17, 2020, 10:49pm

Update:

There is an anomaly regarding the rebuilding process (in the container) of the custom avatars in the "two container, nginx reverse proxy to unix socket" configuration.

See this post on meta:

Avatars lost after restore. How to get them back? - support - Discourse Meta

I have been working on this for a few days, and this issue only appears in the configuration of using nginx as a reverse proxy to a unix socket, but works fine without it.

So far, no resolution.

Neo · April 18, 2020, 3:33am

Update:

I have successfully moved all servers over to the "two container (data, web) with nginx reverse proxy to unix socket" configuration.
The problem was missing information in the support tutorials, which I had to figure out on my own (again).

Basically, here is the problem:

When you follow the tutorials on meta, as I did, they don't tell you that after the transformation is done (create new data, socket-only containers); you must either:

Do a complete DB restore (even thought you already have a working copy of the DB in your data container) because you need all the images which are gzipped in the backup tar file, or:
Just copy the /shared/uploads directory over to your new container.

I did the latter (the copy method) and all the problems went away.

Kindly offered to update the tutorials over at meta, but instead of the guys thanking me, one of they guys started being "edgy" again. I guess this is the "new normal".. .blame the users who read the instructions when the instructions leave out key steps and thing break. I am glad when I went to the university our engineering professors did not blame us when the text books were wrong!

Anyway.... it's working now:

There are zero errors in the web dev console:

Neo · April 21, 2020, 6:08am

Update:

@Scrutinizer and @Neo are getting very close to requesting our team test the new forum. We are not there yet, but getting close:

@Scrutinizer has written some unique custom filtering code to mitigate most of the conflicts outside code tags between special chars and markdown.
@Neo wrote some new code to query the legacy vB mysql db and update the discourse db to delete all "soft deleted" posts and "legacy confidential posts".

TODO:

Test @Scrutinzer's preprocessing code and when finished, request the team to login and test before final migration to sync new posts in old forum with new forum.

Neo · April 23, 2020, 10:52pm

Update on Migration:

There is an old Chinese saying which goes something like this:

"You cannot see the entire mountain standing in the valley".

So, we asked our mod team to do some testing and @hicksd8 immediately found that often used and abused B, I, and I BBCODE tags where causing a lot of problems embedded in other BBCODE tags.

From a statistically point of view, if we ask someone to step into a system with one million posts and within minutes, they find many of the same problems; it can easily be extrapolated that these problems are not outliers.

I discussed this B, I, U BBCODE issue with a number of people; including excellent coders, migrators, testers and casual users and after hearing a lot of opinions and personal preferences, I have made the final decision to strip out out B, I, U for the following very simple reasons:

We already strip out COLOR, SIZE and FONT BBCODE because these BBCODE tags also cause problems and they also make it hard for users to switch from theme to theme because COLORs that might look good in THEME A may not look good (and generally do not look good) in THEME B . This is true for SIZE, FONT and COLOR for sure. It's better to strip them out (and we did strip them weeks ago).
COLOR, SIZE and FONT BBCODE do not convey much meaningful information anyway, so stripping them out does not cause the forums to lose information value. In fact, in most cases COLOR, SIZE and FONT BBCODE are value subtracted and distracting.
The same is true, in my view (and the view of some others, but not all) for B, I, U BBCODE. This BBCODE tags do not convey much meaningful information, so stripping them out does not cause the forums to lose information value. In fact, in most cases B,I and U are value-subtracted (in my view) and distracting because users add them "based on a personal feelings and emphasis". Removing this personal emphasis does not cause the forums to lose any degree of valuable information. In fact, posts look clearer without overdone B,I and U BBCODE use and abuse.
Moreover, embedded B, I, U BBCODE break things in the BBCODE to MARKDOWN transformation.

I asked @hicksd8 to circle back and to do another deep "outside the mountain" view of the migration, and his comments were:

The presentation is superb and virtually nothing to comment on.

So, it's done. I have made the decision to strip out all B, I, U BBCODE and it's a done deal.

FYI, I also stripped out all MOD tags as well; because having MOD comments about "please use CODE tags" or "please show your work" over and over, is not useful information, really. Those informational comments were meant for the poster at the time, not for the ages.

@hicksd8 did have this comments:

My only concerns are regarding links. I've been checking out some of my specialist information threads (well I would wouldn't I?) first and notice that when I refer to previous threads I've written, it links to the old forum.

Let me address this:

Regarding links back to the old site:

These links from the new site to the old site will stay as they are (for the rest of this year at least) for many reasons:

The old site is still live and will be live for the foreseeable future in read-only mode (as a reference site).
Many, many sites with age have links referring to different domains and subdomains, this it not a "show stopper" to go back and forth. In fact, is it pretty standard for any large web site which has undergone a transformation.
It is a monster of a job to rewrite all internal permalink links to match the permalink format of the new site, I am not going to spend time doing it this in 2020.
There may (or may not be) be SEO value, in the short term, to have these links in the new site back pointing back to the old site.
In short, am not worried about working links back to the old site, in the least in 2020, as long as they work and don't break the site, or SEO.

As @hicksd8 commented, and many agree who have recently viewed the migration:

The presentation is superb and virtually nothing to comment on. - hicksd8
AWESOME. - RavinderSingh13
It's 99.99% good. Apart from my concerns about intra-links, I don't think the other stuff is worth worrying about. - hicksd8

@Scrutinizer has been a tremendous help writing Ruby snippets to help clean up BBCODE to MARKDOWN issues. His excellent work is a gift to the community and will always be embedded in the migrated posts.

Soon, I will ask for comments from MadeInGermany, RudiC and other Leaders (status in the new forums).

Until then, have a great weekend.

Hope you find these updates useful.

PS: LOL... my I and B code tags (adding personal emphasis) will be stripped out in the new forums, but I'm OK with that. It's a small sacrifice for the greater good of the overall migration integrity.

Update:

Continue Here:

Please Help Integrity Test New Discourse Forums V2

Neo · April 26, 2020, 7:16am

Update1:

Now in data integrity testing phase: Please Help Integrity Test New Discourse Forums V2

Update2:

Added two columns to the vb3 post table, discourse_topic_id and the discourse_post_id
Added one column to the vb3 thread table, discourse_topic_id
Added Ruby code to the migration script preprocessing control which inserts the discourse_topic_id and the discourse_post_id of the migrated posts back into the mysql dump for the vb3 post and thread tables.

After migration will update the original vb3 post and thread table so we have the mappings to the discourse_post_id and discourse_thread_id, which we will use for a number of redirections and links in the future.

This means, for example, when someone clicks on an old thread in the legacy vb3 forum to reply, we can send them to the new thread to reply.

Cheers.