VBulletin 3.8 to Discourse on Docker Migration Test Take Two

Update:

importing posts...
   408233 / 651614 ( 62.6%)

This is not going to be difficult.. just a lot of work.

I thought, perhaps, because the discourse db is postgres and vb was mysql, this would be difficult; but it's not. The SQL is nearly the same, only commands, but the concept is mostly the same, SQL, tables, queries.

Getting "read into" Ruby take time, but it's not that hard. It's actually interesting and I'm starting to like it.

For example, the discourse system will not let me as the overall system admin change the user "Neo" from this forum to admin. The system bot tells me that it must send me an email which never comes because email is turned off for migration.

So, I looked at the users table in discourse, found a boolean for admin, did a simple SQL query and my status is now also "admin" for user Neo. This is probably not the best way to do things, but at least I can get around when I need to in the DB.

In addition, I have been decoding where the likes are stored when a user likes a post. There is a table called user_actions, and all likes and such user actions are recorded in that table.

discourse=# \d user_actions
                                           Table "public.user_actions"
     Column      |            Type             | Collation | Nullable |                 Default                  
-----------------+-----------------------------+-----------+----------+------------------------------------------
 id              | integer                     |           | not null | nextval('user_actions_id_seq'::regclass)
 action_type     | integer                     |           | not null | 
 user_id         | integer                     |           | not null | 
 target_topic_id | integer                     |           |          | 
 target_post_id  | integer                     |           |          | 
 target_user_id  | integer                     |           |          | 
 acting_user_id  | integer                     |           |          | 
 created_at      | timestamp without time zone |           | not null | 
 updated_at      | timestamp without time zone |           | not null | 
Indexes:
    "user_actions_pkey" PRIMARY KEY, btree (id)
    "idx_unique_rows" UNIQUE, btree (action_type, user_id, target_topic_id, target_post_id, acting_user_id)
    "idx_user_actions_speed_up_user_all" btree (user_id, created_at, action_type)
    "index_user_actions_on_acting_user_id" btree (acting_user_id)
    "index_user_actions_on_action_type_and_created_at" btree (action_type, created_at)
    "index_user_actions_on_target_post_id" btree (target_post_id)
    "index_user_actions_on_target_user_id" btree (target_user_id) WHERE target_user_id IS NOT NULL
    "index_user_actions_on_user_id_and_action_type" btree (user_id, action_type)

So I decoded a like, and found:

discourse=# select * from user_actions order by created_at desc limit 10;
   id    | action_type | user_id | target_topic_id | target_post_id | target_user_id | acting_user_id |         created_at         |         updated_at         
---------+-------------+---------+-----------------+----------------+----------------+----------------+----------------------------+----------------------------
 1065308 |           2 |  138019 |          377378 |         377381 |                |              1 | 2020-03-13 13:49:39.284721 | 2020-03-13 13:49:39.380662
 1065307 |           1 |       1 |          377378 |         377381 |                |              1 | 2020-03-13 13:49:39.284721 | 2020-03-13 13:49:39.360154
 1054388 |           1 |       1 |          377400 |         706814 |                |              1 | 2020-03-13 13:32:43.396314 | 2020-03-13 13:32:43.527381
 1054389 |           2 |       2 |          377400 |         706814 |                |              1 | 2020-03-13 13:32:43.396314 | 2020-03-13 13:32:43.546778
  882882 |          11 |  138027 |          377411 |         706825 |                |              1 | 2020-03-13 09:11:37.832491 | 2020-03-13 09:11:37.832491
  844710 |          12 |      -2 |          377396 |             -1 |                |             -2 | 2020-03-13 08:12:59.394389 | 2020-03-13 08:12:59.437878
  844711 |          13 |  138027 |          377396 |             -1 |                |             -2 | 2020-03-13 08:12:59.394389 | 2020-03-13 08:12:59.448827
  844708 |          12 |      -2 |          377394 |             -1 |                |             -2 | 2020-03-13 08:12:59.053549 | 2020-03-13 08:12:59.108207
  844709 |          13 |  138026 |          377394 |             -1 |                |             -2 | 2020-03-13 08:12:59.053549 | 2020-03-13 08:12:59.153916
  844706 |          12 |      -2 |          377393 |             -1 |                |             -2 | 2020-03-13 08:12:59.01556  | 2020-03-13 08:12:59.06146
(10 rows)

that action_type 2 is a "like".

The likes from vB come from a plugin and so they are not transferred to discourse in the migration; but since I have decoded how to "like" in the discourse DB, I think I'll write.a script to transfer all the likes from mysql / vb to postgres / discourse.

The problem with this is that the vb postids are not transferred to postgres, so I will need to write my first custom migration ruby script to transfer the likes when the migration script runs.

Anyone want to do this for us, if I provide both table schemas?

Update:

  • The first half of migrating "thanks to likes' is done, and I have already created a matching postgres discourse user_actions table in vb mysql to transfer all user likes action to discourse postgres, so can migrate all user "thanks" from vb site to discourse.
  • Later, I will temporarily update discourse posts table and add vb postid column because that info is lost in migration (vbulletin postid != discourse posts id)
  • bbcode to markdown did not appear to work yet. Will need to fix this. This is the major obstacle at the moment. I am trying to avoid rewriting this.
  • I am thinking not to migrate PMs from vb to discourse, and start fresh.
Closing topics...

Postprocessing posts...
   924394 / 1212997 ( 76.2%)  

Basically, getting closer to a great migration setup.

2 Likes

Yea!

bbcode-to-md worked.... all except for some attachments in posts.

Still working on this... looking promising

Closing topics...

Postprocessing posts...
  1212997 / 1212997 (100.0%)  
Creating Permalink File...

Updating topic status

Updating bumped_at on topics

This migration test 2 is finished.

The migration has gone well, and most (not all) of the bbcode was converted to markdown .

There are two issues to fix:

  • The conversion from ICODE tags to markdown added a new line "\n" before and after the markdown tag. This was a copy-and-paste error on my part when I redesigned the migration script and it not easy to fix in the DB.
  • In order to convert the vb plugin "thanks" to likes in discourse , I need to add a new field ( vbpostid ) to the posts table in discourse ; otherwise there is no good way to transfer "thanks" to "likes"

If there was only the ICODE tags I issue, I might leave it be; however I think it is important to transfer the "thanks" to "likes" and so I think I will attempt to fix this without starting from scratch.

Currently, I'm raking all posts to see if I can fix the the bbcode-to-markdown in a few 100K posts which were not cooked properly before I got the bbcode-to-md plugin working.

That process is humming along:

# rake posts:refresh_oneboxes 
Rebaking post markdown for 'default'
   384936 / 1212998 ( 31.7%)

Then I will check the posts again for bbcode/markdown issues.

  • IF, that goes OK.. I will try to purge the topics (threads) and posts in postgres and rebuild only these two tables ( topics , posts ) from scratch, adding the extra field / column to the posts table ( vbpostid ).
  • During that migration, I will fix the issue with ICODE tag migration.

To purge all topics and posts in discourse:

rake destroy:topics_all_categories

If that works well, we will be lookin' good.

If not, I will start over from scratch.

HOWEVER, I "think" I can do this without starting over from scratch.

Have decided to go with (in progress):

rake destroy:topics_all_categories

Then add new vbpostid field to posts table in discourse and import / migrate all topics and posts and attachments again.

This will take a long time. At least two days, I am guessing.

I made a full backup of the "almost working discourse app" first, so if this fails we can revert back to the last working instance.

importing posts...
   651677 / 651677 (100.0%)  [52663 items/min]    
importing attachments...
   268000 / 651677 ( 41.1%)

Some problems important attachments... trying again....

Error:

        41: from script/import_scripts/vbulletin_neo4.rb:968:in `<main>'
	40: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
	39: from script/import_scripts/vbulletin_neo4.rb:97:in `execute'
	38: from script/import_scripts/vbulletin_neo4.rb:602:in `import_attachments'
	37: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/querying.rb:21:in `find_each'
	36: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/batches.rb:69:in `find_each'
	35: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/batches.rb:135:in `find_in_batches'
	34: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/batches.rb:222:in `in_batches'
	33: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/batches.rb:222:in `loop'
	32: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/batches.rb:224:in `block in in_batches'
	31: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:250:in `records'
	30: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:615:in `load'
	29: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:797:in `exec_queries'
	28: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:824:in `skip_query_cache_if_necessary'
	27: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/delegation.rb:65:in `uncached'
	26: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:398:in `scoping'
	25: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:773:in `_scoping'
	24: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:398:in `block in scoping'
	23: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation/delegation.rb:65:in `block in uncached'
	22: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/query_cache.rb:21:in `uncached'
	21: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:79:in `uncached'
	20: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:825:in `block in skip_query_cache_if_necessary'
	19: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/relation.rb:810:in `block in exec_queries'
	18: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/querying.rb:46:in `find_by_sql'
	17: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract/query_cache.rb:107:in `select_all'
	16: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:70:in `select_all'
	15: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract/database_statements.rb:487:in `select'
	14: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:98:in `exec_query'
	13: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:651:in `execute_and_clear'
	12: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:670:in `exec_no_cache'
	11: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:708:in `log'
	10: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
	 9: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:717:in `block in log'
	 8: from /usr/local/lib/ruby/2.6.0/monitor.rb:235:in `mon_synchronize'
	 7: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:718:in `block (2 levels) in log'
	 6: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:671:in `block in exec_no_cache'
	 5: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads'
	 4: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
	 3: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activesupport-6.0.1/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads'
	 2: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/activerecord-6.0.1/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `block (2 levels) in exec_no_cache'
	 1: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-1.1.6/lib/patches/db/pg.rb:69:in `exec_params'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/rack-mini-profiler-1.1.6/lib/patches/db/pg.rb:69:in `exec_params': PQconsumeInput() server closed the connection unexpectedly (PG::ConnectionBad)
	This probably means the server terminated abnormally
	before or while processing the request.

After running all night, that did not seem to work:

discourse=# select count(1) from topics;
 count  
--------
 401096
(1 row)

discourse=# select count(1) from posts;
  count  
---------
 1213014
(1 row)

So, lets just clobber them with:

#su discourse -c 'psql discourse'
discourse=# delete from posts;

and

discourse=# delete from threads;

and see what happens...

Done:

discourse=# delete from posts;
DELETE 1213014
discourse=# delete from topics;
DELETE 401096
discourse=# 

LOL... that worked in a few seconds, the "old fashioned way".... no ruby raking and baking.

Now, I will reimport them.

THIS METHOD DOES NOT WORK

  • Step 1: Dump vb database again and all attachments.
  • Step 2: Transfer to discourse host.
  • Step 3:
    text discourse=# alter table posts add column vbpostid integer; ALTER TABLE discourse=#
  • Step 4: Moved transferred files to shared directory in docker (shared between docker and host)
  • Step 5: Rebuild staging mysql DB in docker / discourse.
  • Step 5: Run modified vbulletin_neo3.rb migration script.
    text export IMPORT=1 su discourse -c 'bundle exec ruby script/import_scripts/vbulletin_neo3.rb'

THIS METHOD DID NOT WORK

I think there are some unknown (to me) status counters for topics and posts, because I did not seem to reimport anything but the news topics / threads since the last import.

So, seems I have little choice if I want to refine (transfer likes, fix ICODE tags) is to start migration from scratch.

Let's see after this "does not seem to be working attempt" completes.

The problem seemed to be with postgres and the fact it is running outside the docker discourse app. But frankly, I don't have time or interest to analyze why this "brute force method" failed.

It's easy enough to start over with all lessons learned from the test migrations.

This is the end of test build 2 (and 3).

Now that I have most of the general bugs worked out; and I'm convinced this migration will work, I will setup another server and start new test build 4 from scratch.

Now waiting for DNS to sync / propagate to new server IP address.

Will start new thread for test build 4 soon.

Continue here, test build 4 on a new VPS:

VBulletin 3.8 to Discourse on Docker Migration Test Take Four

Making progress getting the vB "thanks" to convert and migration to discourse "like"...

Results from migration test run 5:

The number of discourse "likes" will not match the vb forums because discourse awards likes extra weight based on trust level etc (which is OK, at least the concept and basic idea migrates over).

Examples:

1 Like