Need to delete all line breaks in source
Thread poster: kd42
kd42
kd42
Estonia
Local time: 16:10
English to Russian
Jun 5, 2021

Hi, a [*****] client sent me a project with a lot of line breaks, because it is converted from a PDF document by a lazy PM assistant.
I heavily use MT which does not realize that a line break is just a white space, so I want to convert the line breaks to spaces.
Is there any simple way to do it?
Thank you.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 14:10
Member (2009)
Dutch to English
+ ...
try Unbreaker (part of the TransTools suite) Jun 5, 2021

If you can still access the source files as .docx/.doc, you can do this with a VERY handy little tool called Unbreaker, which is part of the TransTools suite.

See: https://www.translatortools.net/products/transtools/unbreaker

3-beforeafter


Stepan Konev
Yaotl Altan
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 16:10
English to Russian
CleanUp Tasks Jun 6, 2021

You can use this CleanUp Tasks plugin
Here is how. Find "Modifying text" and follow the instructions from there to create a rule.
Search: \n
Replace: just type a spac
... See more
You can use this CleanUp Tasks plugin
Here is how. Find "Modifying text" and follow the instructions from there to create a rule.
Search: \n
Replace: just type a space char here
Check the "Regex" box.
Click Save as.
Before using this rule, uncheck all other options except "Use Conversions".

Also, I guess you can edit the sdlxliff file with Notepad, but I never tried that.
Collapse


 
Multiverse Solutions s.r.o. (X)
Multiverse Solutions s.r.o. (X)
Local time: 15:10
Polish to English
+ ...
directly in DOC(X) Jun 6, 2021

If you have a Word file, open the Find & Replace window and use:
Find ^p
Replace with a single space
This will combine all neighbouring paragraphs into larger units.

However, if there are no empty lines between paragraphs, you will get a single paragraph that will need manual splitting into sentences / segments / paragraphs.
To speed up cleaning, you may use eg three spaces in the Replace field and apply Highlight.
merged
Inserting manual Enter (paragraph mark) where needed into a three-spaces space is easy. Cleaning up excess spaces is equally easy after the whole process (two spaces in Find, one space in Replace).

[Edited at 2021-06-06 05:29 GMT]


Yaotl Altan
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 15:10
Member (2006)
English to Afrikaans
+ ...
What is your source file format? Jun 6, 2021

kd42 wrote:
I want to convert the line breaks to spaces.


Do you mean you only have an SDLXLIFF file? Are these line breaks also segment breaks, or do the line breaks occur within segments? There is nothing [simple] you can do about sentences having been broken up across segment boundaries.

I googled for this, and found this article that shows how to do it, in case you need a refresher:
https://www.trados.com/blog/how-to-merge-segments-across-hard-returns-in-sdl-trados-studio.html

Perhaps some of these steps can be automated by the AutoHotKey folks over at the Trados forums:
https://community.sdl.com/product-groups/translationproductivity/f/autohotkey


 
kd42
kd42
Estonia
Local time: 16:10
English to Russian
TOPIC STARTER
Thanks for the suggestions everyone, I'll try MS Word Jun 6, 2021

Thanks a lot for coming to help me, Michael, Stepan, Multiverse, Samuel.

No I don’t have the source Word document or Excel workbook, must work on sdl data.

The plugin recommended by Stepan repeatedly crashes Studio at the very beginning, with a message which I have no intention to investigate.

It is nearly impossible to delete line breaks in Notepad because sdl data contains the source twice, and you should edit only the second occurrence.
Therefor
... See more
Thanks a lot for coming to help me, Michael, Stepan, Multiverse, Samuel.

No I don’t have the source Word document or Excel workbook, must work on sdl data.

The plugin recommended by Stepan repeatedly crashes Studio at the very beginning, with a message which I have no intention to investigate.

It is nearly impossible to delete line breaks in Notepad because sdl data contains the source twice, and you should edit only the second occurrence.
Therefore, I am going to copy-paste the xliff data from Notepad to a Word document, make the second instance of source bold, and then convert all bold paragraph marks to spaces. Then I will copy-paste the data back into xliff using Notepad.

If I do not come back with my grievances and curses, it means this workflow was a success.

Stay healthy and have a nice day! =)
Collapse


 
kd42
kd42
Estonia
Local time: 16:10
English to Russian
TOPIC STARTER
A few more words Jun 6, 2021

A couple of updates.

1) —
The plugin recommended by Stepan crashed Studio because I cut the body of the xliff data and pasted it into MS Word, intending to do the conversion using MS Word, then I decided to ask the colleagues.
So I restored the data, the plugin stopped crashing Studio, I created a rule to replace /n with a white space using Regex, and it did not work, most likely I am making a mistake or missing something which is obvious to the plugin developer or an
... See more
A couple of updates.

1) —
The plugin recommended by Stepan crashed Studio because I cut the body of the xliff data and pasted it into MS Word, intending to do the conversion using MS Word, then I decided to ask the colleagues.
So I restored the data, the plugin stopped crashing Studio, I created a rule to replace /n with a white space using Regex, and it did not work, most likely I am making a mistake or missing something which is obvious to the plugin developer or an advanced Studio user. So I quit at this stage.

2) ——
I opened sdl data in Notepad selected the tag and everything after it, cut and pasted it into Word. This is necessary because if you try and open the data in Word, if will attempt to somehow interpret the xml, and fail.
When the data was in Word, I ran a find/replace pass with “Match wildcards” active, searching for \\*\ and replacing it with just the formatting: bold font. Then I replaced all bold paragraph marks with spaces. Then I copy-pasted the data back into Notepad and saved it. The resulting xliff opens in Studio and has no line breaks.

Have a nice, working afternoon! =)
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 16:10
English to Russian
Probably wrong version, you didn't mention yours Jun 6, 2021

kd42 wrote:
The plugin recommended by Stepan repeatedly crashes Studio at the very beginning, with a message which I have no intention to investigate.
Probably you need the other version (there are two of them).
This one is for Trados 2015-2017: https://appstore.sdl.com/language/app/cleanup-tasks/550/


 
kd42
kd42
Estonia
Local time: 16:10
English to Russian
TOPIC STARTER
The crash was caused by me Jun 6, 2021

Thanks, Stepan, see update 1) above.

Stepan Konev
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 16:10
English to Russian
Ah, I see now, ok Jun 6, 2021

I agree that some Trados features require stepping out of the comfort zone to use them properly. That's true. However, before posting my suggestion, I tried it myself and it worked. Btw you mentioned that you tried /n. It must be \n instead.

[Edited at 2021-06-06 13:39 GMT]


 
kd42
kd42
Estonia
Local time: 16:10
English to Russian
TOPIC STARTER
Most likely it was my mistake Jun 6, 2021

Stepan Konev wrote:
I agree that some Trados features require stepping out of the comfort zone to use them properly. That's true. However, before posting my suggestion, I tried it myself and it worked. Btw you mentioned that you tried /n. It must be \n instead.


I never doubted your knowledge and skills. I checked the settings file and it is /n instead of \n. You solved the problem.

The developer might wish to somehow simplify regex or trigger a warning in such or similar cases, because very few translators have the background and sharp eye like you do. I sometimes receive big projects containing very many small files with unwanted line breaks. Using Word is not feasible with them. Thanks to you I now have a good solution to this issue, I owe you a bottle of Old Tallinn. =)


Stepan Konev
 
kd42
kd42
Estonia
Local time: 16:10
English to Russian
TOPIC STARTER
Need to add a space between number and units Aug 28, 2021

G'day everyone.

I got this "Cleanup Source" plugin working on my system, perform simple tasks, and has just finished scanning its entire manual, with no result.

My problem: the source text is full of values with units, like -- 60V, 77Hz, 12KW, where I need to separate the numeric value from the units with a space, like this -- 60 V, 77 Hz, 12 KW.

In MS Word I use the following pattern
Find: ([0-9])(Hz)
Replace with: \1 \2
(meaning "the te
... See more
G'day everyone.

I got this "Cleanup Source" plugin working on my system, perform simple tasks, and has just finished scanning its entire manual, with no result.

My problem: the source text is full of values with units, like -- 60V, 77Hz, 12KW, where I need to separate the numeric value from the units with a space, like this -- 60 V, 77 Hz, 12 KW.

In MS Word I use the following pattern
Find: ([0-9])(Hz)
Replace with: \1 \2
(meaning "the text within the first pair brackets" "a space" "the text within the second pair brackets")

Question: Does Cleanup Source have this syntax/capability?

Thank you.
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 16:10
English to Russian
Replace \ with $ Aug 28, 2021

Find: ([0-9])([A-z])
Replace with: $1 $2

*I'm not sure if this situation is possible (when you use special letters in units), but just in case...
If you replace 'z' in the above regex with 'ž', the regex will also capture all letters of the Estonian alphabet including Šš, Žž, Õõ, Ää, Öö, Üü.

[Edited at 2021-08-28 20:40 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Need to delete all line breaks in source







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »