John E. Ortega is a computational scientist dedicated to natural language processing research and advancement

Welcome to my website on GitHub. My GitHub user is johneortega. I consider myself a computational linguist and business person with skills ranging from deep learning research on low-resource languages to developing small businesses from the ground up. I am currently an applied researcher and manager with research interests in the areas of machine learning, natural language processing, machine translation, and health care. I also serve as a computer science instructor and researcher at Columbia and New York Universities. I have started and exited two companies and actively advise on technical issues as well as worked as a contract senior architect and software developer for companies such as WebMD, Clear Channel (IHeart Radio), Creative Virtual, Buongiorno, and others. I possess more than 15 years of overall software, system, sales, and marketing experience. In addition to industry knowledge, I have deep academic knowledge and have published many works both for IP and academic purposes.

Deep Learning · Low-Resource Languages · Linguistics and Morphology · Medical and Clinical · Machine Translation · Neural Networks · Algorithms · Peruvian Languages · Fuzzy-Match Repair · Organization and Business Management · Spanish

More About Me

I am of Peruvian and Irish decent, which makes me a mutt I guess. As a first generation American, I feel that I have seen some of the hardships that one can face in the USA concerning race and ethnic issues, although these days much less for me. I have lived in several cities such as New York, Lima (Peru), Madrid, Malaga (Spain), Miami, New York City, Seattle, Trento (Italy), Wolverhampton (England) and more. Nonetheless, I would say that most of my time has been spent in Spain or the USA. If you are interested to know my thoughts on quality of life, have a look at my Spain vs USA comparision on Quora.

I teach a course at Columbia University on data analytics and natural language understanding, my Columbia profile is online. At this point, I don’t have time to teach more classes but would love to some day. I defended my doctoral thesis on fuzzy-match repair with international mention in Alicante, Spain. You can view my UoA Thesis on Quality Estimation and Fuzzy-Match Repair online and feel free to direct any questions that you may have via one of my social channels. You can also email me at (jortega at cs dot nyu dot edu). I am an avid pursuer of solving the six neural machine translation challenges for low-resource languages, here are some thoughts of on a Peruvian low-resource language called Quechua.

My career and hobbies have begun to mix, I perform NLP for fun now. Since the early 2000s, I have worked on tons of software projects including stuff like building toll-road software and other great accomplishments. But, my career really changed in 2011 when I decided, after 10 years as a software developer, to dedicate my life to computational linguistics. That is when I went back to Hofstra University for my Master’s degree in Computer Science and left WebMD to work at Nuance Communications (now Microsoft) in the Clinical Language Understanding group. At Hofstra, I ended up on the Dean’s list and Cum Laude with a perfect 4.0 and a thesis that lead to my first Quechua research paper for machine translation.

Research and Teaching

My research focuses on the role of linguistics, knowledge, and theory in natural language processing and machine learning, sub-fields of artificial intelligence. More specifically, recent work focuses on translation, prediction, deep learning, topic modeling, information retrieval, and Bayesian statistics. Within those broad fields, I have focused on two major sub-fields. First, the process of automatically translating documents through a digital method called machine translation. Second, the use of computer-assisted methods to help digital tool users become more productive. Two major categories of digital tools have been primary in my research: (1) computer-assisted translation (CAT) tools in the localization sector and (2) computer-assisted coding (CAC) tools in the medical sector. Additionally, before research on MT and computer-assisted methods, the focus was on increasing software efficiency in a broad sense, typically dealing with with databases, online software, and big-data. Details of my research are available in my CV and publication list. Academic resources and computer programs that I have worked on are available, in some cases, as open-source programs and, in other cases, as full-fledged systems for private enterprises through licensing. I am the author of several patents and publications in the AI field and have worked on research both in academia and the private sector.

On the teaching front, I believe that, after having spent a few years in classrooms at Columbia, NYU, and Rutgers, I’ve been able to assess and pivot my teaching style to one that helps everyone learn. I must give thanks to Professor Adam Meyers for guiding me through my beginning stages of research and teaching. I also think that the perseverance it takes to go back and complete a PhD after 15 years in the private sector separates me from others. My philosophy was originally based on some of the more popular techniques like focus groups and debating after reading a few books on teaching. However, I learned over time, both teaching and tutoring students, that I achieve better results when I am adaptive to class needs. Students MUST be included in every presentation or piece of work that is addressed to them. So, while I do make sure to cover class specifics such as relation extraction in Natural Language Processing or queues in Data Structures, I also add in other cues to help students focus. In today’s mobile world, it’s very important to make your class interactive and easy-to-understand. I always try to encourage those that want to help and discourage unethical means of work. In general, this leads to an amazing outcome – happy students with CS backgrounds that are ready to apply their knowledge in the real world. Several years ago I decided to dedicate my life to my passion – Computational Linguistics. There is nothing more in this world that I would like to do than pass on my knowledge from academia and the private sector to others so that they can succeed. I have attempted to narrow down my teaching focus for the optimum results.

Machine Translation

I have a keen interest on helping solve the machine translation application problem – most translation systems, even Google Translate, haven’t gotten it correct. The question comes down to what does “correct” mean. For me, correctness in machine translation is as simple as does a person that has no clue about translation completely agree that the translation produced is exactly what they expected? That question is hard to answer as there are many matices involved. In my opinion, we have a long way to go but we have far from “scratched the surface” as some claim. I believe that we are moving in the right direction with deep learning, especially with Transformers.

My academic work has mostly been on two main lines of research: (1) Fuzzy-Match Repair and (2) Low-Resource Languages. I am very excited to say that several people are now beginning to work on Peruvian indigenous languages, for years they were countable on one hand. Here are some videos on my work with Quechua Combinations and Ashaninka. Additionally, I feel that one of the more interesting articles that I have published with Kyungyun Cho is on a technique called BPE-Guided for neural machine translation. Another article worth mentioning, while not focused on low-resource langauges, is a Cool Article from the IEEE Transactions on Pattern Analysis and Intelligence Journal, one of the most prestigous and highly indexed journals in computer science. The article is based on my thesis on Quality Estimation for Fuzzy-Match Repair and can be considered a highlight of my career.

Health Care

In health care, I have tirelessly worked to solve several problems and build software to address those problems. I have always been part of the health care system in one way or another since a kid. Feel free to ask me about my experiences in health care outside of the AI realm. My experience from the digital perspective began in 2011 when I started working with WebMD. There I created tools for the devops team that will eventually guarantee the use of web pages online for everyday users in the USA to find solutions to typical health issues. After that I began to work on a tougher problem rooted in NLP where I applied machine learning algorithms and other techniques with several others including members of the original IBM team who built the Watson question-anwsering system that beat Jeopardy. We created and deployed several applictions whose combined framework eventually became part of what is known as Ambient Clinical Care, a product owned by Microsoft. That work at Nuance led to several patents, including an interesting Acronym Disambiguation Patent. Since then I have also worked with PointClickCare and other consulting companies to help make the patient experience better through AI. Currently, I am working with a company (to be announced) to help solve insurance problems in health care. To see some of the type of stuff I worked on, check out this Talk on Big Data for Medical that I gave in Madrid.

Software

I have built and deployed products for several years. My first job as a software developer was for Taxslayer Software which ended up being a huge success. After that, I worked as a consultant for a few other cool companies back in the early 2000s such as Akamai, Verion/NTT, and more until landing my first job as a web developer and eventually manager at Results Technologies, a company dedicated to creating call-center software, I even got to program in an older language called PIC at Results along with C++, Java, PHP, and more. I have worked in a lot of companies since then and been part of a lot of cool projects.

The past 10 years or so have been more related to NLP, including Big Data work on Hadoop for Thumbplay (IHeartRadio). I have also given talks on Big Data for translation and Big Data for low-resource languages. While most of the software that I have worked on has been for the private sector, I have made available a few projects that I have worked on like Quechua Translator, Termolator, and AshMorph. Additionally, I am in the process of uploading the code for my thesis on Github.

Lastly, here is a cookie-cutter box explanation of all the technologies and clients that I have worked with in the past. I used natural language processing and machine learning to create electronic health records (EHR) applications in acute and senior care facilities; head of the data science team; oversee and implement complex machine learning networks using frameworks like SciKit, TensorFlow, Theano, Rasa, and Apache Uima; IVR Trixxbox solution for text-to-speech API; manage and run company raise and product; Python coding for machine learning, machine translation, and natural language processing; big data with Hadoop on Amazon AWS; Map-Reduce; Django; enterprise development with Java, Oracle 10G, MYSQL, Windows, and Linux; programming J2EE Beans with Jbuilder, Eclipse, and Red Hat Enterprise Edition Linux; create shell scripts for cron jobs and develop devices using JNDI for C scripts; Java architectural roles for recommender system, a payment module, and quality television controller; EJB3 with Annotations and Spring; manage international team including Italy, Portugal, and Spain; European call center software; Navision Migration; Visual Basic 6.0; English language support; SQL administration and programming; reporting with ActiveX objects for marketing; convert PHP website to Java using Struts 1.2, J2SE, and JSP; lead team to convert legacy code to C#; manage MYSQL and Microsoft SQL Server; manage an intranet (Javascript, VBScript, and HTML); Linux system administration; SOAP for web services; Ajax deployment; .Net programming with Visual Studio 2005 and Visual Fox Pro development; internet service provider with Unix scripts; technical support with FTP and email; accounting and booking software.

Client List: Akamai, Alta AI, Bitlogic, Buongiorno, Creative Virtual, Diversified Resorts, Ferrovial, Iheart Radio, Intrum Justitia, Military Stars, Mirada, Orange, Orion International, PointClickCare, Precision Quality Software, Reedus Designs, Results Technologies, Rhodes Financial Services/Taxslayer, Sizewand, Unitio/ThinkExist.com, Verio/NTT, Vidpal, Wade And Wendy, WebMd.

Entrepreneurial

One of my favorite pastimes is creating and developing businesses. I love the hard work and grit that it takes to move a business from its early stages to a cash-flow positive point. This typically requires working on getting revenue and users, important factors to achieving an investment. I have created two of my own business: ListPropertiesNow in 2006 and Vidpal in 2016. The 10 year difference between them proved to be an interesting difference.

I also actively angel invest and advise companies. I can help develop a business since I have gone through two of them now. I have invested in more than 10 companies including my own as an accredited investor. Feel free to view my Angel List Profile which is now kind of outdated. My second company, Vidpal, is a little more recent and still has a few remnants lying around the internet:

  1. Vidpal’s Angel Page
  2. Startups Working Together Group Sponsored By Vidpal
  3. 9-Mile lab final presentation in Seattle
  4. Geekwire Writeup on Vidpal
  5. Vidpal’s quick commercial
  6. Comments by Mike Schein on Inc.com
  7. Vidpal and Amber Rose
  8. Vidpal and Funkmaster Flex
  9. Vidpal at StarupPalooza in Stamford

Feel free to reach out with any questions or doubts. I am looking for the next unicorn, don’t hesitate to let me know how you plan on building it!

Other Interests

I enjoy several outdoor sports including fishing (catch and release), football (soccer), and basketball. I ran a basketball club in Madrid, Spain for 10 years called Madrid Basketball that was dedicated to providing expats a place to play international basketball in Madrid, check out the older Madrid Basketball Facebook Page. I regularly mentor students and others to become successfuli at what they do. Particularly, a few of my latest mentees are: Alexander De La Rosa, Louis Lai, Patrick Candela, Shirin Jabari, Vlad Tyshkevich, and Will Ang. I am a member of several clubs including ACM, EAMT, and more. I enjoy working with kids and have taught English and Computer Science at the grade school level.