As I mentioned before, I had a need recently to archive a text
conversation thread off my phone. Plus, as a general
principle, if I backup my phone, what do I do with it?
The app '
SMS Backup & Restore' does a perfect job of
backing up my phone. It creates a XML file that I can archive
to my computer. But then what? In my case, the XML file
was over 3GB. I have been doing the phone upgrade since 2016
that automatically carries over your old data to your new device, so
I had text messages going back almost 10 years.
I like the idea of keeping all these old text messages, but not
necessarily on my phone. If I regularly backup up my phone,
then 1) I don't have to worry as much about loosing my phone, and 2)
I don't have to keep 10 years (+) of data directly on my phone.
But back to the main problem, once the (3GB) XML file is on my
computer (or in my case, on my home server), what do I do with it?
I found two applications to extract from the backup XML file, but
neither came close to what I needed. I both wanted to extract
a single conversation, but overall, I wanted to access everything in
the XML file. Neither app I could find did that. So ...
I wrote one.
I choose Java for several reasons:
- I'm extremely proficient with Java.
- I've written several applications that parse XML files
- I could foresee that using objects (Abstraction,
Encapsulation, Inheritance, and Polymorphism) would help me keep
things simple, and more importantly, maintainable going forward.
Next I and to decide how I wanted it to work.
- I could do a web app, but that was overkill for my immediate
need.
- I could do a GUI app, but again, more that what I needed - I
wanted to ARCHIVE a specific conversation, not VIEW it. So
extracting a conversation and storing it in a file or files was
the better way to go.
- I decided to extract the conversation (and, why not, each
conversation) to an HTML file, with a matching folder for all
the multimedia components of the messages.
Just to speed things along, I started with the XML example app that
comes with Java 8 SDK.
First, I needed to understand the structure of the XML file I need
to parse. For the text messages, it's basically a list of
messages. The elements look like this:
<smss>
<sms></sms>
<sms></sms>
...
</smss>
<mmss>
<mms></mms>
<mms></mms>
...
</mmss>
The SMS elements are simple. They completely consist of a
single <sms> tag and a bunch of attributes.
The MMS elements consists of subelements:
<parts>
<part></part>
...
</parts>
<addrs>
<addr></addr>
...
</addrs>
Within the <part> elements are the text part of the message
and the multimedia parts. The <addr> elements provide a
list of participants in the conversations.
Using the XML example Java application, it would have been easy to
just recognize each of these elements. Context wouldn't really
matter for this application. For example, the <addr>
element only ever occurs inside the <addrs> element, which
only ever occurs inside the <mms> element. So I could have
just had one level with a big case statement for each of the tags,
and it would have been sufficient for this application. But
that's bad practice, and I just couldn't do that (sorry).
So, how to I create context in a XML element stream? I'm a big
fan of recursive descent parsers for jobs like this, but the element
stream only exists at the top with 'startElement()', 'endElement()'
and 'endDocument()' methods called by the SAX parser. So I
decided to create sub-parsers with their own 'startElement()',
'endElement()', and have the top pass those method calls down.
For example:
main->startElememnt()
mmsParser->startElement()
AddrParser->startElement()
main->endElememnt()
mmsParser->endElement()
AddrParser->endElement()
At each level, when the parser sees an element, it creates a
subparser for that element.
- When it sees a 'startElement()' call and a subparser exists,
it call the subparser's 'startElement()'.
- When it sees a 'endElement()' call and a subparser exists, it
call sthe subparser's 'endElement()', then distroys the parser.
My cheap and dirty recursive descent for XML.
The application has two parts to it:
- Parse the XML document
- Create the HTML files
The second part is all done in the 'endDocument()' method. The
parser has created a list of conversations, each conversation has a
list of messages. Here I iterate through the conversations,
and
- add an entry to the index page.
- create a conversation page with all the messages and
multimedia.
This app reads the XML file, and writes out all the HTML and media
files.
ISSUES
Images:
The first main issue I had was storing the images. Initially I
just parsed the document. I waited until the 'endDocument()'
to write anything out to files. Well, I have a 3GB XML
file. Parsing it and storing it in memory was possibly a lot
more than 3GB. That worked on my home workstation (with 128GB
of RAM) but when I tried to work on this on my laptop while
traveling, I couldn't as I kept getting 'out of memory' Java
exceptions.
Solution: Since I was going to store all these images to files later
anyway, I am storing them to file immediately, and only keeping the
filename for the end.
HEIC images:
I have HEIC images in my text messages, but Firefox can't display
them.
Solution: I have a script that postprocesses the output after
parsing the XML file that finds and converts all the HEIC images to
JPEG images.
Phone Numbers:
I explicitly wrote this to only display phone numbers assuming they
are US based numbers. This will have problems if there are
international phone numbers in the XML files.
Solution:
Write a bunch of code that normalizes phone numbers (with or without
the + and with or without the country code).
FUTURES
I'm tentatively planning on open sourcing this app and putting it in
a public GIT repository. Let me know if you would be
interested in that.
I'm thinking that archiving text messages, and having access to
them, literally forever, might be useful to a lot of people, so I'm
considering providing a cloud service that provided cloud storage
for all your backups, and (using this app) access to all the
messages in all your backups. Let me know if you'd be
interested in that too.
There are lots of tweaks and improvements I can make, but it does
everything I want now. We'll have to see where this goes.