3

Since Arabic text is automatically reversed in PowerShell, I had to reverse-flip it to make it readable.
For example, the text 'مرحبا' , will look 'ابحرم'

I can change them one by one, but I have difficulty with many words at once.

function ReverseText {
    Return ( -Join ($Args.ToCharArray() | Sort {(--$script:i)}) )
}
$AJoin  = 'ابتثجحخدذرزسشصضطظعغفقكلمنهوي'
$AMatch = $AJoin.ToCharArray() -join '|'

$data = @(
"1. The verb for the male singular in Arabic is?"
"   a) 'تفعل'" 
"   b) 'تفعلان'"  
"   c) 'يفعلون'"  
"   d) 'يفعل'"
"2. 'تلميذ' is the mufrad form of the word?"
"   a) 'تلاميذ'"
"   b) 'تلميذات'"  
"   c) 'تلميذان'" 
"   d) 'تلميذين'"
)

ReverseText 'بيتان'
$Newdata = $data | Foreach {
    If ($_.Split("'") -match $AMatch) {
        'configuous to continue'
    }
}
$Newdata; pause

From the script, i can reserve one word 'بيتان' with command ReverseText 'بيتان'.

From $data, I want to change every Arabic word into reverse letter order, then save it in $Newdata. It's only 2 out of 50 taken from a .txt file, as example.
I'm not capable enough to do it yet.

I appreciate any helps. Thanks.

4
  • 3
    reversing an Arabic string will only work in some instances, depending on the word, simple reversing of the string will fail. What you need to do is convert the whole string from logical to visual rendering. Commented May 26, 2024 at 12:14
  • @Andj Interesting. If you are willing, you can provide an example as an answer for the question. Thanks. Commented May 26, 2024 at 12:23
  • 2
    simply reversing like that won't work because Arabic uses ZWJ and ZWNJ a lot so strings containing those won't be correct anymore after reversing. Besides that'll make the display incorrect in terminals with RTL support. It's better to use a GUI for RTL text. It's very easy to show a GUI window in powershell, or for simple cases you can just use the default Out-GridView's GUI Commented May 26, 2024 at 15:55
  • 1
    Although ZWJ and ZWNJ is probably more important in Persian. But take 'فلا' for example. This would be the codepoints 0641 0644 0627, if you reverse the string, you get the codepoints 0627 0644 0641, but the correct visual order would be 0644 0627 0641, i.e. 0644 0627 need to be treated as a single unit, where order is significant. You may also need to take into account initial versus final glyphs, and use ZWJ/ZWNJ at the character level to control which glyphs are used, or swap in presentations forms instead. You could try a .NET wrapper for fribidi library to get a visually ordered string. Commented May 26, 2024 at 23:24

2 Answers 2

3

Do the following in order to selectively reverse runs of Arabic characters in your input strings:

$data | ForEach-Object {
  [regex]::Replace($_, 
    '\p{IsArabic}{2,}',
    { param($m) [Array]::Reverse(($chars = $m.Value.ToCharArray())); [string]::new($chars) }
  )
}
$data -replace 
  '\p{IsArabic}{2,}', 
  { [Array]::Reverse(($chars = $_.Value.ToCharArray())); [string]::new($chars) }

  • In both cases, regex '\p{IsArabic}{2,}' is used to match all runs of two or more ({2,}) Arabic characters (\p{IsArabic}), and any such run's characters are reversed.

  • In PowerShell hosts that do not support Unicode's bidirectional text-rendering algorithm - such as both the legacy Windows console host (conhost.exe) and Windows Terminal - the result then displays (mostly) correctly, but note that this approach should (a) fundamentally be limited to such hosts, and (b) should only be used to produce for-display output.

    • Caveat: As noted in the comments, this simple character-by-character reversal isn't always enough, as Arabic text can contain ligatures and other groups of characters that must be treated as a unit - see this answer for background information.

    • See this answer for general background information regarding bidirectional text rendering.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. On PS5 (both opened in its self or WT) I had an issue where the arabic text was understood as other characters. Even though the PS1 file encoding is UTF-8-BOM. this is what it looks like a) 'ØªÙØ¹Ù„ from a) 'لعفت'. I don't know exactly what Windows settings have changed, even though 2 days before they didn't.
2 days before, by change the encoding to UTF-8-BOM, the problem solved. Now, it's like that again, even though it's UTF-8-BOM. stackoverflow.com/q/78524051/25082295
@OprexDroid: The solution you indirectly link to - stackoverflow.com/a/54790355/45375 - should still work, and it isn't related to any Windows settings that may change. Open your .ps1 file in a modern editor such as Visual Studio Code and ensure that (a) the text displays correctly and that (b) the encoding is shown as UTF-8 with BOM. If not, first ensure correct display, then re-save, if necessary, as UTF-8 with BOM. To check in Windows PowerShell if a given file $file starts with a UTF-8 BOM use "$((Get-Content -TotalCount 3 -Encoding Byte $file))" -eq '239 187 191'
Oh yes. I forgot that the data source comes from a .txt file which by default is UTF-8. After I changed the encoding, it was OK.
@mklement0 simple reversal will fail in a number of scenarios including the presence of ligatures, a more robust approach would be to convert the string from logical to visual order.
0

Based on mklement0's answer, I tried another way which worked both on PS5 and PS7.
Thanks for him.

$Newdata = $data | ForEach {
        [regex]::Replace($_, '\p{IsArabic}{2,}',
        {param($m); -Join ($m.value.ToCharArray() | Sort {(--$script:i)})})
    }

6 Comments

The Sort {(--$script:i) trick is clever, and allows for a concise solution (but note that variable $i will linger; also, you don't need the ; after param($m)), but it'll be relatively slow. If that is a concern, the fastest - albeit more obscure - solution is: [Array]::Reverse(($chars = $m.Value.ToCharArray())); [string]::new($chars)
@mklement0 thanks. i will compare each of them and select one which is faster.
@OprexDroid how are you handling glyphs in the reversal, Your first character will move to being last character, it will go from being displayed with initial glyph to being displayed with final glyph, also will non-spacing marks be in text, and if yes, how are you processing them?
For instance with "دُو" (U+062F U+064F U+0648), reversed this would be "وُد" (U+0648 U+064F U+062F). so instead of damma being applied to Dal, it is applied to Waw instead during reversal. Reversing extended grapheme clusters would give U+0648 U+062F U+064F which would be a more correct reversal of the string.
and without ligatures like lam-alef which also need to be processed as a unit. It would also break for Persian when ZWJ/ZWNJ are required word internal. Current state of teh art for terminal emulators, including powershell, is primitive.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.