0

i am making an application which extract data from (doc, docx,pdf )files, but these files are written in Hebrew language, so how can i extract proper data from these files and apply reguler expression on that data and code must support UTF8 charset and support both LTR and RTL text direction.New line characters must be retained in the text.

1 Answer 1

0

You need to study RL a little more.

  1. PDF is sometime written in visual mode. So you just need to reverse it. Not the string - only the hebrew. http://php.net/manual/en/function.hebrevc.php will not help, since it does the opposite.
  2. Word and ODT are saved in logical mode, so no reversal is needed.

Arabic and Hebrew are only displayed in "reverse" but stored in the same order as in english (fist word is first on file).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.