Recent

Author Topic: Converting I think emDash to Dash (From Word)  (Read 1367 times)

zxandris

  • Full Member
  • ***
  • Posts: 101
Converting I think emDash to Dash (From Word)
« on: March 04, 2024, 04:49:35 pm »
Word replaces a dash between words to I think it's a long dash em dash though I'm not sure.

I'm doing this on paste, and load but this is my code that doesn't work

Code: Pascal  [Select][+][-]
  1. var
  2.    txt : String;
  3.    S : string = #$e2+#$80+#$93;
  4. begin
  5.          txt := Clipboard.AsText;
  6.          txt := RemoveFancyQuotes(txt);
  7.          txt := AnsiToUTF8(txt);
  8.          txt := StringReplaceAll(txt, '—', '-');
  9.          txt := Utf8StringReplace(txt, S, '-', [rfReplaceAll,rfIgnoreCase]);
  10.          txt := UTF8toAnsi(txt);
  11.          rtfEditor.selText := txt;
  12.          try
  13.              rtfEditor.SelStart := rtfEditor.SelStart + rtfEditor.SelLength;
  14.              rtfEditor.SelLength := 0;
  15.          except
  16.                //
  17.          end;
  18.  

I am assuming the dash put in by word isn't a normal emdash at all, which is why I tried to replace with unicode, though that doesn't work either.  Has anyone any idea how to fix this, it's quite annoying.  This follow's on from a previous post I made but I thought this deserved it's own post.

Thanks in advance,

CJ

jamie

  • Hero Member
  • *****
  • Posts: 6734
Re: Converting I think emDash to Dash (From Word)
« Reply #1 on: March 04, 2024, 11:00:41 pm »
How about commenting line 10.?

Trying to put it back to ansi which is the utf8 doesn't covert to nothing but 7 bit ascii

The only true wisdom is knowing you know nothing

zxandris

  • Full Member
  • ***
  • Posts: 101
Re: Converting I think emDash to Dash (From Word)
« Reply #2 on: March 05, 2024, 08:52:50 am »
How about commenting line 10.?

Trying to put it back to ansi which is the utf8 doesn't covert to nothing but 7 bit ascii

I tried that, didn't seem to do anything either way but following your advice I'm not using that line now, but this is what I currently have.

Code: Pascal  [Select][+][-]
  1.          
  2.          txt := Clipboard.AsText;
  3.          txt := RemoveFancyQuotes(txt);
  4.          txt := RemoveFancySingle(txt);
  5.          txt := sysUtils.StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase]);
  6.          txt := sysUtils.StringReplace(txt, chr(96), '-', [rfReplaceAll,rfIgnoreCase]);
  7.          txt := Utf8StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase]);
  8.          txt := Utf8StringReplace(txt, S, '-', [rfReplaceAll,rfIgnoreCase]);
  9.          rtfEditor.selText := txt;
  10.  

As you can see I'm tidying up the input as it comes off the clipboard, makes me wonder if that weird dash (which I'm not even sure is just an emdash anymore), is being made weird before I can do any changes.  I suppose does anyone know what the character actually is?  And I converted to RTF the file I'm using, searched out the area the weird dash is and according to that the code is 96, but as you can see I'm trying to replace that to no avail.  It end's up looking like a quote, with an extra space, but in point of fact when exporting to a html file, it looks like a question mark then a quote.

This is all quite frustrating, and it's a grammar rule in Word.  Now I don't personally allow that change, but this is essentially customized writing software and my editor/beta apparently does, so I'm trying to come up with an automated way to correct that.  So far I've managed to sort out the curly/fancy single and double quotes but that darn dash is a darn nightmare (Replace darn for something far more fitting for my level of frustration :)).  If ANY one has any ideas on how to solve this I would be ever-so grateful.  It's a bit more than a niggle at this point, and as noted when I export for publishing it looks just nasty.  Now I will obviousy ask my editor/beta to cut out using that rule, but that would just be them and I hope to have a wider audience.

Thanks, really, thanks for the help I've gotten so far, this forum rocks.

CJ

Lansdowne

  • New Member
  • *
  • Posts: 35
Re: Converting I think emDash to Dash (From Word)
« Reply #3 on: March 05, 2024, 10:36:47 am »
Code: Pascal  [Select][+][-]
  1.          
  2.          txt := Clipboard.AsText;
  3.  
  4.          txt := sysUtils.StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase])
  5.          txt := sysUtils.StringReplace(txt, chr(96), '-', [rfReplaceAll,rfIgnoreCase]);
  6.          txt := Utf8StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase]);
  7.          rtfEditor.selText := txt;
  8.  
First thing to try is where you have  '—', or chr(96), copy and paste the actual dash char that you are finding in the Word document.  Of course keeping it inside the ''.



paweld

  • Hero Member
  • *****
  • Posts: 1268
Re: Converting I think emDash to Dash (From Word)
« Reply #4 on: March 05, 2024, 11:11:08 am »
Can you attach a sample application that demonstrates this problem?
Best regards / Pozdrawiam
paweld

zxandris

  • Full Member
  • ***
  • Posts: 101
Re: Converting I think emDash to Dash (From Word)
« Reply #5 on: March 05, 2024, 11:13:57 am »
Code: Pascal  [Select][+][-]
  1.          
  2.          txt := Clipboard.AsText;
  3.  
  4.          txt := sysUtils.StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase])
  5.          txt := sysUtils.StringReplace(txt, chr(96), '-', [rfReplaceAll,rfIgnoreCase]);
  6.          txt := Utf8StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase]);
  7.          rtfEditor.selText := txt;
  8.  
First thing to try is where you have  '—', or chr(96), copy and paste the actual dash char that you are finding in the Word document.  Of course keeping it inside the ''.

I've tried that it actually seems to be converted on copy/paste back into a normal dash.  Someone did manage to provide be with the character long dash and i did a replace on that, but that didn't work, now I've got to assume it's not a normal kind of dash at all.
Code: Pascal  [Select][+][-]
  1. txt := sysUtils.StringReplace(txt, '—', '-', [rfReplaceAll,rfIgnoreCase])
  2.  

If I'm reading right that line above essentially does what you're suggesting, and I've tried - oh how I've tried, to simple copy and paste from the docx file into the editor, now I've managed to replace that darn dash when doing a straight import from rtf which is great, but unfortunately I need to be able to do it live from the clipboard.

It's seriously frustrating, and I'm assuming at this point that little dash isn't standard at all.  I don't actually have word, but the document I'm using to test has come from word.  I can't straight import from DOCX, so I've been having to copy/paste, I've managed to get most of what I need working it's just that silly dash.

dseligo

  • Hero Member
  • *****
  • Posts: 1406
Re: Converting I think emDash to Dash (From Word)
« Reply #6 on: March 05, 2024, 12:13:38 pm »
I've tried that it actually seems to be converted on copy/paste back into a normal dash.  Someone did manage to provide be with the character long dash and i did a replace on that, but that didn't work, now I've got to assume it's not a normal kind of dash at all.

Then find what it is. Something like that:

Code: Pascal  [Select][+][-]
  1. var
  2.    txt : String;
  3.    S : string = #$e2+#$80+#$93;
  4.   iCnt, iStart, iEnd: Integer;
  5.   sRes: String;
  6. begin
  7.          txt := Clipboard.AsText;
  8.   sRes := '';
  9.   // iStart and iEnd are approximate location of your 'dash' symbol, it could be different than actual characters because UTF8
  10.   iStart := 10;
  11.   iEnd := 20;
  12.   // Warning: you should check if string is long enough
  13.   For iCnt := iStart to iEnd do
  14.     sRes := sRes + iCnt.ToString + ' ' + Ord(txt[iCnt]).ToString + ' ' + txt[iCnt] + LineEnding;
  15.   ShowMessage(sRes);
« Last Edit: March 05, 2024, 12:16:16 pm by dseligo »

zxandris

  • Full Member
  • ***
  • Posts: 101
Re: Converting I think emDash to Dash (From Word)
« Reply #7 on: March 05, 2024, 12:36:58 pm »
I've tried that it actually seems to be converted on copy/paste back into a normal dash.  Someone did manage to provide be with the character long dash and i did a replace on that, but that didn't work, now I've got to assume it's not a normal kind of dash at all.

Then find what it is. Something like that:

Code: Pascal  [Select][+][-]
  1. var
  2.    txt : String;
  3.    S : string = #$e2+#$80+#$93;
  4.   iCnt, iStart, iEnd: Integer;
  5.   sRes: String;
  6. begin
  7.          txt := Clipboard.AsText;
  8.   sRes := '';
  9.   // iStart and iEnd are approximate location of your 'dash' symbol, it could be different than actual characters because UTF8
  10.   iStart := 10;
  11.   iEnd := 20;
  12.   // Warning: you should check if string is long enough
  13.   For iCnt := iStart to iEnd do
  14.     sRes := sRes + iCnt.ToString + ' ' + Ord(txt[iCnt]).ToString + ' ' + txt[iCnt] + LineEnding;
  15.   ShowMessage(sRes);

This here?  "iCnt.ToString" does that return an ASCII value because I've found that ORD, doesn't seem to properly do that.

dseligo

  • Hero Member
  • *****
  • Posts: 1406
Re: Converting I think emDash to Dash (From Word)
« Reply #8 on: March 06, 2024, 01:49:49 am »
This here?  "iCnt.ToString" does that return an ASCII value because I've found that ORD, doesn't seem to properly do that.

'iCnt.ToString' has nothing to do with Ord, it's just counter so it's easier to find position.
If 'ToString' doesn't work for you change line to:
Code: Pascal  [Select][+][-]
  1. sRes := sRes + IntToStr(iCnt) + ' ' + IntToStr(Ord(txt[iCnt])) + ' ' + txt[iCnt] + LineEnding;

 

TinyPortal © 2005-2018