Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for tables #7

Open
srevinsaju opened this issue Jan 11, 2020 · 5 comments
Open

support for tables #7

srevinsaju opened this issue Jan 11, 2020 · 5 comments

Comments

@srevinsaju
Copy link

srevinsaju commented Jan 11, 2020

is tables supported in this parser. for me, I am getting the unformatted output. thanks

@peter17
Copy link
Owner

peter17 commented Jan 11, 2020

Hi @srevinsaju please see in https://github.com/peter17/mediawiki-parser/blob/master/tests/test_tables.py some example of tables that can be parsed. Regards

@srevinsaju
Copy link
Author

srevinsaju commented Jan 12, 2020

thanks for the info @peter17 , one more question, in the parsed text, I am getting for reference . should I add ref to the html tags list or should I use some other file to convert the ref

@peter17 : This was the table I was try to parse.

{| class="wikitable" style="font-size:90%"
|-! Research !! Impact|-| Static electricity and magnetism (1600)
Electric current (18th century) || All electric appliances, dynamo's, electric power stations, modern electronics, including electric lighting, television, electric heating, magnetic tape, loudspeaker, plus the compass and lightning rod.|-| Diffraction (1665) || Optics, hence fiber optic cable (1840s), cable TV and internet|-| Germ theory (1700) || Hygiene, leading to decreased transmission of infectious diseases; antibodies, leading to techniques for disease diagnosis and targeted anticancer therapies.|-| Vaccination (1798) || Leading to the elimination of most infectious diseases from developed countries and the worldwide eradication of smallpox.|-| Photovoltaics (1839) || Solar cells (1883), hence solar power, solar powered watches, calculators and other devices.|-| The strange orbit of Mercury (1859) and other research
leading to special (1905) and general relativity (1916) || Satellite-based technology such as GPS (1973), satnav and communications satellites.<ref name="nasa 2004">Evicting Einstein, March 26, 2004, NASA. "Both [relativity and quantum mechanics] are extremely successful. The Global Positioning System (GPS), for instance, wouldn't be possible without the theory of relativity. Computers, , and the Internet, meanwhile, are spin-offs of quantum mechanics."</ref>|-| Radio waves (1887) || Radio quickly became known for its use in broadcast radio (1906) and television (1927) entertainment. It was also much used in areas of telephony, emergency services, radar (navigation and weather forecasting), medicine, astronomy, wireless communications, and networking. Radio research also led to the use of microwaves, for heating and cooking food.|-| Radioactivity (1896) and antimatter (1932) || Cancer treatment (1896), Radiometric dating (1905), nuclear reactors (1942) and weapons (1945), PET scans (1961), and medical research (with isotopic labelling)|-|X-rays (1896)|| Medical imaging, including computer tomography|-| Crystallography and quantum mechanics (1900) || Semiconductor devices (1906), hence modern computing and telecommunications including the integration with wireless devices: the mobile phone<ref name="nasa 2004" />|-|Plastics (1907)||Starting with bakelite, many types of artificial polymers for numerous applications in industry and daily life|-|Antibiotics (1880's, 1928) || Salvarsan, Penicillin, doxycycline etc.|-|Nuclear magnetic resonance (1930's) || Nuclear magnetic resonance spectroscopy (1946), magnetic resonance imaging (1971), functional magnetic resonance imaging (1990's).|}

@peter17
Copy link
Owner

peter17 commented Jan 14, 2020

@srevinsaju I think it should be added to the html tags list

When I copy your table to a MediaWiki page and preview, it does not format as a table... Some newline characters are missing, it should be something like:

{| class="wikitable" style="font-size:90%"
|-
! Research !! Impact
|-
| Static electricity and magnetism (1600)
Electric current (18th century) || All electric appliances, dynamo's, electric power stations, modern electronics, including electric lighting, television, electric heating, magnetic tape, loudspeaker, plus the compass and lightning rod.
|-
| Diffraction (1665) || Optics, hence fiber optic cable (1840s), cable TV and internet
|-
| Germ theory (1700) || Hygiene, leading to decreased transmission of infectious diseases; antibodies, leading to techniques for disease diagnosis and targeted anticancer therapies.
|-
| Vaccination (1798) || Leading to the elimination of most infectious diseases from developed countries and the worldwide eradication of smallpox.
|-
| Photovoltaics (1839) || Solar cells (1883), hence solar power, solar powered watches, calculators and other devices.
|-
| The strange orbit of Mercury (1859) and other research
leading to special (1905) and general relativity (1916) || Satellite-based technology such as GPS (1973), satnav and communications satellites.<ref name="nasa 2004">Evicting Einstein, March 26, 2004, NASA. "Both [relativity and quantum mechanics] are extremely successful. The Global Positioning System (GPS), for instance, wouldn't be possible without the theory of relativity. Computers, , and the Internet, meanwhile, are spin-offs of quantum mechanics."</ref>
|-
| Radio waves (1887) || Radio quickly became known for its use in broadcast radio (1906) and television (1927) entertainment. It was also much used in areas of telephony, emergency services, radar (navigation and weather forecasting), medicine, astronomy, wireless communications, and networking. Radio research also led to the use of microwaves, for heating and cooking food.
|-
| Radioactivity (1896) and antimatter (1932) || Cancer treatment (1896), Radiometric dating (1905), nuclear reactors (1942) and weapons (1945), PET scans (1961), and medical research (with isotopic labelling)
|-
|X-rays (1896)|| Medical imaging, including computer tomography
|-
| Crystallography and quantum mechanics (1900) || Semiconductor devices (1906), hence modern computing and telecommunications including the integration with wireless devices: the mobile phone<ref name="nasa 2004" />
|-
|Plastics (1907)||Starting with bakelite, many types of artificial polymers for numerous applications in industry and daily life
|-
|Antibiotics (1880's, 1928) || Salvarsan, Penicillin, doxycycline etc.
|-
|Nuclear magnetic resonance (1930's) || Nuclear magnetic resonance spectroscopy (1946), magnetic resonance imaging (1971), functional magnetic resonance imaging (1990's).
|}

@srevinsaju
Copy link
Author

@peter17 Thanks for the speedy reply, this wikitext was actually downloaded from the wikipedia-dumps, and ti used to work well with pediapress/mwlib . But now as mwlib is python2, I cam looking for a python3 solution and found this. As for the end-user, mightnot know this regression in the wikitext from the dumps, should I put a newline after every |- . I aplogize for this noob question.
A question regarding : I added it to the HTML tags, but unfortunately , its not showing as the normal wikipedia [1] or [2], But instead nothing is shown. Is the References still under developement. Thanks @peter17 Its indeed a greate piece of software !! It saved a lot of my time

@srevinsaju
Copy link
Author

@peter17 , I got the output of your above mentioned code as:

'<body>\n<table>\n<tr>\n</tr>\n<tr>\n\t<th> Research !! Impact</th>\n</tr>\n<tr>\n\t<td> Static electricity and magnetism (1600)<p>Electric current (18th century) || All electric appliances, dynamo\'s, electric power stations, modern electronics, including electric lighting, television, electric heating, magnetic tape, loudspeaker, plus the compass and lightning rod.</p>\n</td>\n</tr>\n<tr>\n\t<td> Diffraction (1665) </td>\n\t<td> Optics, hence fiber optic cable (1840s), cable TV and internet</td>\n</tr>\n<tr>\n\t<td> Germ theory (1700) </td>\n\t<td> Hygiene, leading to decreased transmission of infectious diseases; antibodies, leading to techniques for disease diagnosis and targeted anticancer therapies.</td>\n</tr>\n<tr>\n\t<td> Vaccination (1798) </td>\n\t<td> Leading to the elimination of most infectious diseases from developed countries and the worldwide eradication of smallpox.</td>\n</tr>\n<tr>\n\t<td> Photovoltaics (1839) </td>\n\t<td> Solar cells (1883), hence solar power, solar powered watches, calculators and other devices.</td>\n</tr>\n<tr>\n\t<td> The strange orbit of Mercury (1859) and other research<p>leading to special (1905) and general relativity (1916) || Satellite-based technology such as GPS (1973), satnav and communications satellites.&lt;ref name="nasa 2004"&gt;Evicting Einstein, March 26, 2004, NASA. "Both [relativity and quantum mechanics] are extremely successful. The Global Positioning System (GPS), for instance, wouldn\'t be possible without the theory of relativity. Computers, , and the Internet, meanwhile, are spin-offs of quantum mechanics."&lt;/ref&gt;</p>\n</td>\n</tr>\n<tr>\n\t<td> Radio waves (1887) </td>\n\t<td> Radio quickly became known for its use in broadcast radio (1906) and television (1927) entertainment. It was also much used in areas of telephony, emergency services, radar (navigation and weather forecasting), medicine, astronomy, wireless communications, and networking. Radio research also led to the use of microwaves, for heating and cooking food.</td>\n</tr>\n<tr>\n\t<td> Radioactivity (1896) and antimatter (1932) </td>\n\t<td> Cancer treatment (1896), Radiometric dating (1905), nuclear reactors (1942) and weapons (1945), PET scans (1961), and medical research (with isotopic labelling)</td>\n</tr>\n<tr>\n\t<td>X-rays (1896)</td>\n\t<td> Medical imaging, including computer tomography</td>\n</tr>\n<tr>\n\t<td> Crystallography and quantum mechanics (1900) </td>\n\t<td> Semiconductor devices (1906), hence modern computing and telecommunications including the integration with wireless devices: the mobile phone&lt;ref name="nasa 2004" /&gt;</td>\n</tr>\n<tr>\n\t<td>Plastics (1907)</td>\n\t<td>Starting with bakelite, many types of artificial polymers for numerous applications in industry and daily life</td>\n</tr>\n<tr>\n\t<td>Antibiotics (1880\'s, 1928) </td>\n\t<td> Salvarsan, Penicillin, doxycycline etc.</td>\n</tr>\n<tr>\n\t<td>Nuclear magnetic resonance (1930\'s) </td>\n\t<td> Nuclear magnetic resonance spectroscopy (1946), magnetic resonance imaging (1971), functional magnetic resonance imaging (1990\'s).</td>\n</tr>\n</table>\n</body>'

This was not exactly what i wanted, albeit, is better than before. There is an unprocessed '||' in between the output and some other instances. Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants