Let's say you decided to leave LiveJournal and to migrate to Dreamwidth. You do a full import of your journal to DW and everything looks fine and dandy, but there is one problem: if your LJ posts linked to each other, the import process would not adjust such links, and they still would point to livejournal.com. Clearly unacceptable! I wrote a quick and dirty script to fix that. Details below.
1) Get the ljdump.py script, put it in a directory.
2) In the same directory create a config file, ljdump.config:
3) Still in the same directory, run ljdump.py . It will create a subdirectory with the name of your journal and a full backup of all entries and comments. My script relies on the presence of that backup and the config file (ljdump.config).
4) Save my script to a file (fix_links.py) and run it in the same directory. It will go over your entries (but not comments) and will try to change all links to your livejournal posts (and tags) to the corresponding Dreamwidth links. It will ask before writing every entry back (but if you feel especially bold you can comment out the raw_input line).
Disclaimers: No warranties, and only tested on my journal. But it worked, and I feel great about the result.
Hasta la vista, LJ!
The script (big chunks borrowed from ljdump):
1) Get the ljdump.py script, put it in a directory.
2) In the same directory create a config file, ljdump.config:
<xml> <server>https://www.dreamwidth.org</server> <username>bluedrag</username> <password>password</password> <journal>bluedrag</journal> </xml>(substitute bluedrag with your own journal name, and password with your password).
3) Still in the same directory, run ljdump.py . It will create a subdirectory with the name of your journal and a full backup of all entries and comments. My script relies on the presence of that backup and the config file (ljdump.config).
4) Save my script to a file (fix_links.py) and run it in the same directory. It will go over your entries (but not comments) and will try to change all links to your livejournal posts (and tags) to the corresponding Dreamwidth links. It will ask before writing every entry back (but if you feel especially bold you can comment out the raw_input line).
Disclaimers: No warranties, and only tested on my journal. But it worked, and I feel great about the result.
Hasta la vista, LJ!
The script (big chunks borrowed from ljdump):
#!/usr/bin/python
import codecs, os, pickle, pprint, re, shutil, sys, urllib2, xml.dom.minidom, xmlrpclib
import glob
import re
import xml.etree.ElementTree as ET
url = {}
posts = {}
try:
from hashlib import md5
except ImportError:
import md5 as _md5
md5 = _md5.new
def calcchallenge(challenge, password):
return md5(challenge+md5(password).hexdigest()).hexdigest()
def flatresponse(response):
r = {}
while True:
name = response.readline()
if len(name) == 0:
break
if name[-1] == '\n':
name = name[:len(name)-1]
value = response.readline()
if value[-1] == '\n':
value = value[:len(value)-1]
r[name] = value
return r
def getljsession(server, username, password):
r = urllib2.urlopen(server+"/interface/flat", "mode=getchallenge")
response = flatresponse(r)
r.close()
r = urllib2.urlopen(server+"/interface/flat", "mode=sessiongenerate&user=%s&auth_method=challenge&auth_challenge=%s&auth_response=%s" % (username, response['challenge'], calcchallenge(response['challenge'], password)))
response = flatresponse(r)
r.close()
return response['ljsession']
def dochallenge(server, params, password):
challenge = server.LJ.XMLRPC.getchallenge()
params.update({
'auth_method': "challenge",
'auth_challenge': challenge['challenge'],
'auth_response': calcchallenge(challenge['challenge'], password)
})
return params
def process(server_url, username, password, journal):
for filename in sorted(glob.glob(journal+'/L-*')):
try:
tree = ET.parse(filename)
except ET.ParseError as e:
print '%s: %s' % (filename, e)
continue
root = tree.getroot()
dw_url = root.find('url').text
try:
import_source = root.find('props').find('import_source').text
except AttributeError:
print '%s: LJ url not found' % filename
continue
lj_url = re.sub(r'livejournal\.com/(.*?)/(.*)', r'http://\1.livejournal.com/\2.html', import_source)
#print "%s -> %s" % (lj_url, dw_url)
url[lj_url] = dw_url
posts[dw_url] = root
ljsession = getljsession(server_url, username, password)
server = xmlrpclib.ServerProxy(server_url + "/interface/xmlrpc")
for dw_url, post in sorted(posts.iteritems()):
old_text = post.find('event').text
new_text = re.sub(r'http://([\w\d_-]+)\.livejournal\.com/tag/',
r'http://\1.dreamwidth.org/tag/', old_text)
for lj, dw in url.iteritems():
new_text = new_text.replace(lj, dw)
if old_text != new_text:
print new_text
print dw_url
print
itemid = post.find('itemid').text
try:
subject = post.find('subject').text
except AttributeError:
subject = ''
print itemid, subject
s = raw_input('Proceed? (y/n) ')
if s != 'y':
continue
e = server.LJ.XMLRPC.editevent(dochallenge(server, {
'username': username,
'ver': 1,
'event': new_text,
'itemid': itemid,
'subject': subject,
#'lineendings': 'unix',
}, password))
print "Edit result:", e
print
if os.access("ljdump.config", os.F_OK):
config = xml.dom.minidom.parse("ljdump.config")
server = config.documentElement.getElementsByTagName("server")[0].childNodes[0].data
username = config.documentElement.getElementsByTagName("username")[0].childNodes[0].data
password = config.documentElement.getElementsByTagName("password")[0].childNodes[0].data
journals = config.documentElement.getElementsByTagName("journal")
if journals:
for e in journals:
process(server, username, password, e.childNodes[0].data)
else:
process(server, username, password, username)
(no subject)
Date: 2017-07-12 21:05 (UTC)Slightly updated version is at https://github.com/adept/ljdump (I am diffing text pre and post replacement and handle lj usernames with underscores)
(no subject)
Date: 2017-07-12 21:11 (UTC)