Package 'CTM' reference manual

Title:	A Text Mining Toolkit for Chinese Document
Description:	The CTM package is designed to solve problems of text mining and is specific for Chinese document.
Authors:	Jim Liu, Quan Gu
Maintainer:	Jim Liu <[email protected]>
License:	GPL-3
Version:	0.2
Built:	2025-02-25 05:00:55 UTC
Source:	https://github.com/cran/CTM

Document Term Matrix

Description

Constructs Document-Term Matrix from Chinese Text Documents.

Usage

CDTM(doc, weighting, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)
CDTM(doc, weighting, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)

Arguments

`doc`	The Chinese text document. A vector of Chinese strings.
`weighting`	Available weighting function with matrix are binary, count, tf, tfidf. See details.
`EngTermDeleted`	remove English from text documents.
`NumTermDeleted`	remove Numbers from text documents.
`shortTermDeleted`	Deltected short word when nchar <2.

Details

This function run a Chinese word segmentation by jiebeR and build document-term matrix, and there is four weighting function with matrix, and "binary" means value can only be 1 if the term occurs, "count" means how many times the term occurs in a doc, "tf" means term frequency and "tfidf" means term frequency inverse document frequency.

Author(s)

Jim Liu, Quan Gu

Examples

library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
dtm1 <- CTDM(doc = text1, weighting = "tfidf",EngTermDeleted = FALSE, shortTermDeleted = FALSE)
library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
dtm1 <- CTDM(doc = text1, weighting = "tfidf",EngTermDeleted = FALSE, shortTermDeleted = FALSE)

Term Document Matrix

Description

Constructs Term-Document Matrix from Chinese Text Documents.

Usage

CTDM(doc, weighting, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)
CTDM(doc, weighting, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)

Arguments

`doc`	The Chinese text document. A vector of Chinese strings.
`weighting`	Available weighting function with matrix are binary, count, tf, tfidf. See details.
`EngTermDeleted`	remove English from text documents.
`NumTermDeleted`	remove Numbers from text documents.
`shortTermDeleted`	Deltected short word when nchar <2.

Details

This function run a Chinese word segmentation by jiebeR and build term-document matrix, and there is four weighting function with matrix, and "binary" means value can only be 1 if the term occurs, "count" means how many times the term occurs in a doc, "tf" means term frequency and "tfidf" means term frequency inverse document frequency.

Author(s)

Jim Liu, Quan Gu

Examples

library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
tdm1 <- CTDM(doc = text1, weighting = "tfidf", EngTermDeleted = FALSE, shortTermDeleted = FALSE)
library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
tdm1 <- CTDM(doc = text1, weighting = "tfidf", EngTermDeleted = FALSE, shortTermDeleted = FALSE)

Term Count

Description

Computing term count from text documents

Usage

termCount(doc, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)
termCount(doc, EngTermDeleted = TRUE, NumTermDeleted = TRUE,
  shortTermDeleted = TRUE)

Arguments

`doc`	The Chinese text document.
`EngTermDeleted`	remove English from text documents.
`NumTermDeleted`	remove Numbers from text documents.
`shortTermDeleted`	Deltected short word when nchar <2.

Details

This function run a Chinese word segmentation by jiebeR and compute term count from all these text document.

Author(s)

Jim Liu

Examples

library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
count1 <- termCount(doc = text1, EngTermDeleted=FALSE, shortTermDeleted = FALSE)
library(CTM)
a1 <- "hello taiwan"
b1 <- "world of tank"
c1 <- "taiwan weather"
d1 <- "local weather"
text1 <- t(data.frame(a1,b1,c1,d1))
count1 <- termCount(doc = text1, EngTermDeleted=FALSE, shortTermDeleted = FALSE)

Package 'CTM'

Help Index

Document Term Matrix

Description

Usage

Arguments

Details

Author(s)

Examples

Term Document Matrix

Description

Usage

Arguments

Details

Author(s)

Examples

Term Count

Description

Usage

Arguments

Details

Author(s)

Examples