Let's say you decided to leave LiveJournal and to migrate to Dreamwidth. You do a full import of your journal to DW and everything looks fine and dandy, but there is one problem: if your LJ posts linked to each other, the import process would not adjust such links, and they still would point to livejournal.com. Clearly unacceptable! I wrote a quick and dirty script to fix that. Details below.
1) Get the ljdump.py script, put it in a directory.
2) In the same directory create a config file, ljdump.config:
3) Still in the same directory, run ljdump.py . It will create a subdirectory with the name of your journal and a full backup of all entries and comments. My script relies on the presence of that backup and the config file (ljdump.config).
4) Save my script to a file (fix_links.py) and run it in the same directory. It will go over your entries (but not comments) and will try to change all links to your livejournal posts (and tags) to the corresponding Dreamwidth links. It will ask before writing every entry back (but if you feel especially bold you can comment out the raw_input line).
Disclaimers: No warranties, and only tested on my journal. But it worked, and I feel great about the result.
Hasta la vista, LJ!
The script (big chunks borrowed from ljdump):
1) Get the ljdump.py script, put it in a directory.
2) In the same directory create a config file, ljdump.config:
<xml> <server>https://www.dreamwidth.org</server> <username>bluedrag</username> <password>password</password> <journal>bluedrag</journal> </xml>(substitute bluedrag with your own journal name, and password with your password).
3) Still in the same directory, run ljdump.py . It will create a subdirectory with the name of your journal and a full backup of all entries and comments. My script relies on the presence of that backup and the config file (ljdump.config).
4) Save my script to a file (fix_links.py) and run it in the same directory. It will go over your entries (but not comments) and will try to change all links to your livejournal posts (and tags) to the corresponding Dreamwidth links. It will ask before writing every entry back (but if you feel especially bold you can comment out the raw_input line).
Disclaimers: No warranties, and only tested on my journal. But it worked, and I feel great about the result.
Hasta la vista, LJ!
The script (big chunks borrowed from ljdump):
#!/usr/bin/python import codecs, os, pickle, pprint, re, shutil, sys, urllib2, xml.dom.minidom, xmlrpclib import glob import re import xml.etree.ElementTree as ET url = {} posts = {} try: from hashlib import md5 except ImportError: import md5 as _md5 md5 = _md5.new def calcchallenge(challenge, password): return md5(challenge+md5(password).hexdigest()).hexdigest() def flatresponse(response): r = {} while True: name = response.readline() if len(name) == 0: break if name[-1] == '\n': name = name[:len(name)-1] value = response.readline() if value[-1] == '\n': value = value[:len(value)-1] r[name] = value return r def getljsession(server, username, password): r = urllib2.urlopen(server+"/interface/flat", "mode=getchallenge") response = flatresponse(r) r.close() r = urllib2.urlopen(server+"/interface/flat", "mode=sessiongenerate&user=%s&auth_method=challenge&auth_challenge=%s&auth_response=%s" % (username, response['challenge'], calcchallenge(response['challenge'], password))) response = flatresponse(r) r.close() return response['ljsession'] def dochallenge(server, params, password): challenge = server.LJ.XMLRPC.getchallenge() params.update({ 'auth_method': "challenge", 'auth_challenge': challenge['challenge'], 'auth_response': calcchallenge(challenge['challenge'], password) }) return params def process(server_url, username, password, journal): for filename in sorted(glob.glob(journal+'/L-*')): try: tree = ET.parse(filename) except ET.ParseError as e: print '%s: %s' % (filename, e) continue root = tree.getroot() dw_url = root.find('url').text try: import_source = root.find('props').find('import_source').text except AttributeError: print '%s: LJ url not found' % filename continue lj_url = re.sub(r'livejournal\.com/(.*?)/(.*)', r'http://\1.livejournal.com/\2.html', import_source) #print "%s -> %s" % (lj_url, dw_url) url[lj_url] = dw_url posts[dw_url] = root ljsession = getljsession(server_url, username, password) server = xmlrpclib.ServerProxy(server_url + "/interface/xmlrpc") for dw_url, post in sorted(posts.iteritems()): old_text = post.find('event').text new_text = re.sub(r'http://([\w\d_-]+)\.livejournal\.com/tag/', r'http://\1.dreamwidth.org/tag/', old_text) for lj, dw in url.iteritems(): new_text = new_text.replace(lj, dw) if old_text != new_text: print new_text print dw_url print itemid = post.find('itemid').text try: subject = post.find('subject').text except AttributeError: subject = '' print itemid, subject s = raw_input('Proceed? (y/n) ') if s != 'y': continue e = server.LJ.XMLRPC.editevent(dochallenge(server, { 'username': username, 'ver': 1, 'event': new_text, 'itemid': itemid, 'subject': subject, #'lineendings': 'unix', }, password)) print "Edit result:", e print if os.access("ljdump.config", os.F_OK): config = xml.dom.minidom.parse("ljdump.config") server = config.documentElement.getElementsByTagName("server")[0].childNodes[0].data username = config.documentElement.getElementsByTagName("username")[0].childNodes[0].data password = config.documentElement.getElementsByTagName("password")[0].childNodes[0].data journals = config.documentElement.getElementsByTagName("journal") if journals: for e in journals: process(server, username, password, e.childNodes[0].data) else: process(server, username, password, username)
(no subject)
Date: 2017-07-12 21:05 (UTC)Slightly updated version is at https://github.com/adept/ljdump (I am diffing text pre and post replacement and handle lj usernames with underscores)
(no subject)
Date: 2017-07-12 21:11 (UTC)