# Converting Math Symbols from PDF into LaTeX

I am trying to extract Math content from LaTeX generated PDF files. Most extracted symbols get extracted fine. However some, such as `epsilon`, `Updownarrow`, `simeq` use non Unicode codes and others such as `neq` use a combination of non Unicode codes.

• `epsilon` is written using the embedded font `SCCPFS+CMMI10` and code 017
• `Updownarrow` using the embedded font `KAXSYH+CMSY10` and code `0x6d (m)`
• `simeq` using the embedded font `KAXSYH+CMSY10` and code `0x27 (')`
• `neq` using the embedded font `KAXSYH+CMSY10` and codes `0x36 (/)` and `0x3d (=)`

Before I begin writing a table to map from the glyph code(s) to the equivalent LaTeX I wonder if such a mapping table already exists in the reverse direction for use within LaTeX. After all, somewhere the original `epsilon`, `neq` etc. would be getting mapped to one or more glyph codes. The combination cases will require position information also, but that should be there too, in the reverse direction.

EDIT: I tried to lookup this information in the font table but there are no entries in GSUB and GPOS. Is that where I should be looking? Is the information really inside the font?

EDIT: I tried looking up the mmap file in a text editor but it is mostly hex. Is there a tool for opening it?

``````%!PS-Adobe-3.0 Resource-CMap
%%DocumentNeededResources: ProcSet (CIDInit)
%%IncludeResource: ProcSet (CIDInit)
%%BeginResource: CMap (TeXmath-LMR-0)
%%Title: (TeXmath-LMR-0 TeXmath LMR 0)
%%Version: 1.000
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (TeXmath)
/Ordering (LMR)
/Supplement 0
>> def
/CMapName /TeXmath-LMR-0 def
/CMapVersion 1.000 def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
96 beginbfchar
<00> <005C00620069006700630069007200630020>
<01> <005C006D0064006C00670062006C006B0063006900720063006C00650020>
<02> <005C0073007100750061007200650020>
<03> <005C0062006C00610063006B0073007100750061007200650020>
<04> <005C0076006100720074007200690061006E0067006C00650020>
<05> <005C0062006C00610063006B0074007200690061006E0067006C00650020>
<06> <005C0074007200690061006E0067006C00650064006F0077006E0020>
<07> <005C0062006C00610063006B0074007200690061006E0067006C00650064006F0077006E0020>
<08> <005C006C006F007A0065006E006700650020>
<09> <005C0062006C00610063006B006C006F007A0065006E006700650020>
<0A> <005C006D0064006C00670062006C006B006400690061006D006F006E00640020>
``````

