Regex in EditPad Pro – finding the beginning or end of a file

I quite often need to ‘wrap’ a text file in the <xml></xml> tags when converting a plain text file in to XML, so that the file is a valid XML document.

Using EditPad Pro with regex makes this straightforward.

\A finds the beginning of a file

\z finds the end of a file

So, with Regex turned on use the following Search and Replace:

\A –> <xml>

\z –> </xml>

Posted in HTML & XML, Regular Expressions (RegEx) | Comments Off

Using EditPad Pro to convert a TXT subtitle file to XML

The original file I received looked like this, and is a typical subtitle file as used by Adobe Encore.

1
00:00:03,030 –> 00:00:07,989
It brings me great pleasure to welcome to Stanford Jack Dorsey,

2
00:00:07,990 –> 00:00:11,419
who is as you know, the co-founder and chairman of the

3
00:00:11,420 –> 00:00:15,909
board of Twitter and the co-founder and CEO of Square.

 

To convert the file to a structured XML file for translation in SDL Studio I ran the following regular expressions (regex) in EditPad Pro.

1) Match the timecode and place

2)Find ID number
(^[\d]*) replace with<id>$1</id>

3)Find text line
(^.*^\w.*$) replace with<trans>$1</trans>

4)Place <xml> at start of file
\A  replace with <xml>

5) Place </xml> at end of file
\z  replace with </xml>

The resulting file then looks like this:

<xml>

<seg>1</seg>
<time>00:00:03,030 –> 00:00:07,989</time>
<trans>It brings me great pleasure to welcome to Stanford Jack Dorsey,</trans>

<seg>2</seg>
<time>00:00:07,990 –> 00:00:11,419</time>
<trans>who is as you know, the co-founder and chairman of the</trans>

<seg>3</seg>
<time>00:00:11,420 –> 00:00:15,909</time>
<trans>board of Twitter and the co-founder and CEO of Square.</trans>

</xml>

A new XML filetype can now be made in SDL studio to filter out the translatable text between the <trans> tags.

Posted in HTML & XML, Regular Expressions (RegEx), Subtitles & Captions | Comments Off