Nick Guenther's research while affiliated with University of Waterloo and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (4)
Text mining is the process of turning free text into numerical variables and then analyzing them with statistical techniques. We introduce the command ngram, which implements the most common approach to text mining, the “bag of words”. An n-gram is a contiguous sequence of words in a text. Broadly speaking, ngram creates hundreds or thousands of va...
Support vector machines are statistical- and machine-learning techniques with the primary goal of prediction. They can be applied to continuous, binary, and categorical outcomes analogous to Gaussian, logistic, and multinomial regression. We introduce a new command for this purpose, svmachines. This package is a thin wrapper for the widely deployed...
Text mining is the art of turning free text into numerical variables and then analyzing them with statistical techniques. We introduce the Stata command ngram which implements the most common approach to text mining, "bag of words''. An n-gram is a contiguous sequence of words in a text. Broadly speaking, ngram creates hundreds or thousands of vari...
Citations
... A n-gram is a consecutive sequence of n words in a text [17]. This research used a combination of unigram and bigram tokenization. ...
... Stata for text mining exists yet there is much space for developing in a growing field (see for instance Provalis Research 2024 and William and Williams 2014 andSchonlau et al. 2017) ...
... SVMs in Scikitlearn support both dense and sparse sample vectors as input. [9] Bagging classifier An ensemble meta-estimator called a bagging classifier model fits base classifiers one at a time to random subsets of the original dataset, and it then averages or votes on each classifier's predictions to produce a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator like a decision tree, by introducing randomisation into its construction procedure and then making the ensemble out of it. ...
... N-grams, in this context, are word phrases consisting of n-number of words in direct proximity to each other (Bharadwaj & Shao, 2019;Gurcan & Cagiltay, 2023;Schonlau et al., 2017). This study was focused exclusively on bigrams (i.e., two-word phrases). ...