Nlp mini projects github

Nlp mini projects github

I expect to graduate by June Manish Shrivastava. I have previously interned at startups and research labs, working on both software development projects and research projects. I have worked on various systems and software projects, most of which are on my github. I spend my free time by reading books that interest me, exploring new places, scrolling through reddit, or planning my next trip.

If you have an interesting opportunity in software development or data science, hit me up! Worked with Professor Prasanna Tantri's group at Center for Analytical Finance CAF to build models and create tools for loan approval process automation and bank transaction analysis using statistical machine learning algorithms.

Currently working on extraction of key elements from unstructured legal documents. Created webapps to showcase different models. Led a team of 3 and built an android application, a webapp and a RESTful API for the clients to fetch and update details of the products in their inventory. Developed a model that given an arithmetic word problem, it extracts the relevant quantities,and creates the required expression tree by predicting the operators using Deep Reinforcement learning[DQN].

Developed a fully functional front-end of the compiler for a custom programming language, similar to C. Built parser, scanner, abstract syntax tree, interpreter for generating intermediate representation LLVM IR code for an input code file.

Your First GitHub Pull Request (in 10 Mins)

Created a search engine that uses Block-Sort-Based-Indexing to create the inverted index of the entire WikiPedia dump Implemented a system that takes an image and a question about the image as the input, and predicts the answer to the question.

Developed a system to identify crowd patterns by WiFi requests sent by mobile devices and triangulate client locations with WiFi routers. Implemented an Application Level program for a P2P-network to keep two separate directories synced, similar to Dropbox. Used sockets to communicate; maintained file-indices, and MD5 hashes on all peers. Implemented Phrase Based Machine Translation Model and various Neural Machine Translation Models, including one using attention with modeling coverage, for translations between Hindi and Urdu languages.

Implemented a command line interpreter in C which supports background jobs, environment variables, signal catching, piping and redirection with extensive error-handling. Implemented a small SQL engine with support for basic queries, joins and aggregate functions.

Built a bot for 4x4x4 ultimate tic-tac-toe game which decides the next move on the computer generated board. Was among the Top 8 bots in a class of [AI bot tournament]. About Me. Research interests: Natural Language Processing Information Retrieval and Extraction Multimodal Learning I spend my free time by reading books that interest me, exploring new places, scrolling through reddit, or planning my next trip.

Word Problem Solver Developed a model that given an arithmetic word problem, it extracts the relevant quantities,and creates the required expression tree by predicting the operators using Deep Reinforcement learning[DQN]. View Project.

10 Python Machine Learning Projects on GitHub

Custom Language Compiler Developed a fully functional front-end of the compiler for a custom programming language, similar to C. Visual Question Answering Implemented a system that takes an image and a question about the image as the input, and predicts the answer to the question.

Linux Shell Implemented a command line interpreter in C which supports background jobs, environment variables, signal catching, piping and redirection with extensive error-handling. Ultimate Tic-Tac-Toe bot Built a bot for 4x4x4 ultimate tic-tac-toe game which decides the next move on the computer generated board. Get in Touch Send.Topic modeling involves extracting features from document terms and using mathematical structures and frameworks like matrix factorization and SVD to generate clusters or groups of terms that are distinguishable from each other, and these cluster of words form topics or concepts.

These concepts can be used to interpret the main themes of a corpus and also make semantic connections among words that co-occur together frequently in various documents. There are various frameworks and algorithms to build topic models. Document clustering or cluster analysis is an interesting area in NLP and text analytics that applies unsupervised ML concepts and techniques. The main premise of document clustering is similar to that of document categorization, where you start with a whole corpus of documents and are tasked with segregating them into various groups based on some distinctive properties, attributes, and features of the documents.

Document classification needs pre-labeled training data to build a model and then categorize documents. Document clustering uses unsupervised ML algorithms to group the documents into various clusters. We will quantify the similarity of movies based on their plot summaries available on IMDb and Wikipedia, then separate them into groups, also known as clusters.

Read more. Two documents are similar if their vectors are similar. In this post, we will explore this idea through an example. A heatmap of Amazon books similarity is displayed to find the most similar and dissimilar books. Oftentimes it is required to construct a dataset by scraping a website and extracting relevant information.

I will be using IMDB website to pull user reviews for the top Thriller movies and construct a dataset that will later be used to perform NLP tasks like: shallow parsing, clustering and sentiment analysis.

In this post, the focus is on how to create the dataset and how to do shallow parsing by breaking down each user review into Noun-chunks. NLP projects. Topic Modeling using NMF and LDA using sklearn Topic modeling involves extracting features from document terms and using mathematical structures and frameworks like matrix factorization and SVD to generate clusters or groups of terms that are distinguishable from each other, and these cluster of words form topics or concepts.

Document Clustering Document clustering or cluster analysis is an interesting area in NLP and text analytics that applies unsupervised ML concepts and techniques. Document Similarity Two documents are similar if their vectors are similar.

Scrape IMDB movie reviews Oftentimes it is required to construct a dataset by scraping a website and extracting relevant information.Here is a list of top Python Machine learning projects on GitHub. A continuously updated list of open source learning projects is available on Pansop. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources. Pattern is a web mining module for Python. Pylearn2 is a library designed to make machine learning research easy. Its a library based on Theano. Ramp is a python library for rapid prototyping of machine learning solutions.

It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools scikit-learn, rpy2, etc.

Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently. Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees.

It also performs feature selection. These classifiers can be combined in many ways to form different classification systems. For unsupervised learning, milk supports k-means clustering and affinity propagation. Skdata is a library of data sets for machine learning and statistics. It's a library consisting of useful tools and extensions for day-to-day data science tasks.

REP is environment for conducting data-driven research in a consistent and reproducible way. It can train classifiers parallely on a cluster. It support of interactive plots.

Views: Tags: machine-learningpython. Share Tweet Facebook. Join Data Science Central. Are they comparable or, for certain tasks, is one of them superior to the other? Sign Up or Sign In. Added by Tim Matteson 0 Comments 2 Likes. Added by Tim Matteson 0 Comments 1 Like. Added by Tim Matteson 1 Comment 1 Like.Thinknowlogy is grammar-based software, designed to utilize the Natural Laws of Intelligence in grammar, in order to create intelligence through natural language in software. This is demonstrated by programming in natural language, reasoning in natural language and drawing conclusions more detailed than scientific solutionsmaking assumptions with self-adjusting level of uncertaintyasking questions about gaps in the knowledgeand detecting conflicts in the knowledge.

It builds semantics autonomously with no vocabularies or words listsdetecting some cases of semantic ambiguity. It is multi-grammar, proving that Natural Laws of Intelligence are universal. Bitextor is an application whose objective is to generate translation memories using multilingual Web sites as a corpus source. It downloads all the HTML files in a Web site, it performs a preprocess to convert them to a coherent and suitable format and, finally, applies a set of heuristics based mainly on HTML tag structure and text block length to make pairs of files which are candidates to contain the same text in different languages.

From these candidates, translation memories are generated in TMX format using the library LibTagAligner, which uses the HTML tags and the length of text chunks to perform the alignment. It uses XHTML tag structure and text block length to calculate the most probable alignment between the both files. Once it has done so, TagAligner uses a set of rules defined by the user to cut every text block into phrases and then it generates a TMX file that represents the translation memory obtained from the original files.

You can download TagAligner as an application or as a library to be used by other applications. RelEx is an English-language semantic dependency relationship extractor, built on the Carnegie-Mellon Link Grammar parser. It can identify subject, object, indirect object, and many other syntactic dependency relationships between words in a sentence; it generates dependency trees, resembling those of dependency grammars, and specifically, those of Dekang Lin's MiniPar and the Stanford parser.

nlp mini projects github

It accomplishes this by applying a sequence of rules, based on the local context, and thus resembles constraint grammar in its implementation.

In this sense, it implements some of the ideas of Hudson's Word Grammar. However, unlike other dependency parsers, RelEx attempts a greater degree of semantic normalization. Example code and a demo application are included to help get you started. Its semantic capabilities include named entity extraction, keyword extraction, concept extraction, categorization, language detection, and text cleaning.

Wintermute is an intelligent framework of applications and libraries that uses neural networking to learn about its host.

nlp mini projects github

A pseudo-langauge engine that permits translations and grammar rulesets of any language to be incorporated into the system, and database downloads of different sets of data combine to provide a virtual self-thinking assistant that can be used to perform tasks like dictation to a text editor, and more complex tasks such as sorting of documents depending on the time of day, or automation of other routine tasks.

It should be noted that Wintermute itself is a meta-project. It encompasses a large array of currently existing and potential produced projects. It was written with a focus on platform-independence and easy integration into applications.

It is based on a binary search algorithm that finds the n-grams and returns their frequency counts in logarithmic time. As the corpus is stored in many files, a simple index is used to retrieve the files containing the n-grams. Effective Freecode is no longer being updated content may be stale.

Updated 07 Nov Thinknowlogy P op Updated 04 May Bitextor P op Updated 04 May TagAligner P op Updated 14 Oct RelEx P op Apache 2. Perl semantic web Natural Language Processing keyword. Updated 18 May tagger P op Updated 17 Nov Wintermute P op Updated 16 Feb jWeb1T P op Machine learning, as a field, is growing at a breakneck speed.

Github is that whiteboard which the whole world is watching. Top quality code is being regularly posted on that infinite board of wisdom. It is obviously impossible to track all things that go on in the world of machine learning but Github has a star-rating for each project. Basically, if you star a repository, you show your appreciation for the project as well as keep track of repositories that you find interesting.

Highest Rated ML Projects on Github

This star rating then can be one of the good metrics to know the most followed projects. It provides an application programming interface API for Python and the command line. It is useful for recognising and manipulating faces in images. The deep-learning model has an accuracy of This library can also handle real-time face recognition.

It is lightweight and allows users to learn text representations and sentence classifiers. It works on standard, generic hardware. Models can be reduced in size to even fit on mobile devices. Text classification is a core problem to many applications, like spam detection, sentiment analysis or smart replies. The goal of text classification is to assign documents such as emails, posts, text messages, product reviews, etc… to multiple categories.

It is a very useful resource for NLP enthusiasts. This is a collection of resources that help you understand and utilise TensorFlow. The github repo contains a curated list of awesome TensorFlow experiments, libraries, and projects.

TensorFlow is an end-to-end open source platform for machine learning designed by Google. It has a comprehensive ecosystem of tools, libraries and community resources that lets researchers create the state-of-the-art in ML.My topic of research is deep reinforcement learning, which is less focused on computer vision and more on general machine learning or even artificial intelligence. Note that I only supervise students at Imperial College London, so please do not contact me about supervision otherwise.

I expect students to be a highly motivated and b technically proficient. Projects can, and have in the past, relied on research released during the course of the project. Some parts of machine learning can be found in optional modules in bioengineering courses, but modern deep learning is currently not taught at Imperial as far as I am aware.

On the other hand I am usually available to answer questions that may arise. I use the Lua-based Torch7 library. If MATLAB is the only programming language you've used, you are unlikely to have the programming skills required to make good progress.

I am interested in doing a project under your guidance Skip to content. Instantly share code, notes, and snippets. Code Revisions 21 Stars 18 Forks 4. Embed What would you like to do?

Embed Embed this gist in your website. Share Copy sharable link for this gist. Learn more about clone URLs. Download ZIP.

nlp mini projects github

Student Projects. It is better to take a break, relax, and come back to a problem with a clear mind, rather than stressing over it continuously.

Help each other. Be prepared to read each other's code, writing, equations etc. A second eye is a valuable asset. Read papers, and equations, carefully. Several times through the course of your project. In machine learning often the small details are key in getting the big idea to work.

Machine learning algorithms are often very sensitive to hyperparameter choices, so you will probably have to try several combinations. Marking schemes are not designed for projects involving cutting-edge research.This is an extremely competitive list and it carefully picks the best open source Python libraries, tools and programs published between January and December Mybridge AI evaluates the quality by considering popularity, engagement and recency.

To give you an idea about the quality, the average number of Github stars is 3, Open source projects can be useful for programmers. You can learn by reading the source code and build something on top of the existing projects. Give a plenty of time to play around with Python projects you may have missed for the past year. A Beginner. B Data Science. A Web hosting : Get free domain name for a year.

Click the numbers below. Credit given to the biggest contributor. Home-assistant v0. Courtesy of Paulus Schoutsen. Grumpy: A Python to Go source code transcompiler and runtime. Courtesy of Dylan Trotter and others at Google. Sanic: Async Python 3.

Courtesy of Channel Cat and Eli Uriegas. Python-fire: A library for automatically generating command line interfaces CLIs from absolutely any Python object.

Courtesy of David Bieber and others at Google Brain. Courtesy of Matthew Honnibal. Courtesy of Kenneth Reitz. MicroPython: A lean and efficient Python implementation for microcontrollers and constrained systems [ stars on Github].

Prophet: Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth [ stars on Github].

10 Python Machine Learning Projects on GitHub

Courtesy of Facebook. Courtesy of Nicholas Brochu. Dash: Interactive, reactive web apps in pure python [ stars on Github]. Courtesy of Chris P. InstaPy: Instagram Bot. Courtesy of TimG. Apistar: A fast and expressive API framework. For Python [ stars on Github]. Courtesy of Tom Christie.

Faiss: A library for efficient similarity search and clustering of dense vectors [ stars on Github]. Courtesy of Matthijs Douze and others at Facebook Research.


comments

Leave a Reply

Your email address will not be published. Required fields are marked *