Python maxCommon.iterTsvRows函数代码示例

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中maxCommon.iterTsvRows函数的典型用法代码示例。如果您正苦于以下问题：Python iterTsvRows函数的具体用法？Python iterTsvRows怎么用？Python iterTsvRows使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了iterTsvRows函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: parseTabPublisherFile

def parseTabPublisherFile(fname):
    " parse a file with columns eIssn, publisher (optional) and urls into a list of records "
    logging.info("Parsing %s" % fname)
    journals = list(maxCommon.iterTsvRows(fname, encoding="latin1"))
    # modify publisher field
    datasetName = splitext(basename(fname))[0]
    headers = list(journals[0]._fields)
    addPubField = False
    if "publisher" not in headers:
        headers.insert(0, "publisher")
        addPubField =True
    JRec = collections.namedtuple("Journal", headers)
    newJournals = []
    for j in journals:
        if j.eIssn.lower()=="print only" or j.eIssn.lower()=="unknown":
            logging.debug("Skipping journal %s, no eIssn" % j.title)
            continue
        if addPubField:
            newJ = [datasetName]
            newJ.extend(j)
            newJRec = JRec(*newJ)
        else:
            newJRec = j
        newJournals.append(newJRec)
    return newJournals

开发者ID:Moxikai，项目名称:pubMunch，代码行数:25，代码来源:pubResolvePublishers.py

示例2: updatePmids

def updatePmids(medlineDir, crawlDir, updateIds, minYear=None):
    """ go over subdirs of crawlDir, for each: read the ISSNs, and add new
    PMIDs we have in medlineDir to subdir/pmids.txt

    We never remove a PMID from pmids.txt.
    """ 
    logging.info("Now updating crawler directories with the new PMIDs")
    eIssnToPIssn = getEIssnToPIssn(pubConf.publisherIssnTable)
    issnToPmid, issnToJournal = getIssnPmidDict(medlineDir, updateIds, minYear)
    for subdir in getSubdirs(crawlDir):
        pmidFname = join(crawlDir, subdir, "pmids.txt")
        issnFname = join(crawlDir, subdir, "issns.tab")
        if not isfile(issnFname) or not isfile(pmidFname):
            continue
        logging.debug("reading subdir %s: %s and %s" % (subdir, pmidFname, issnFname))
        issns = [row.issn.strip() for row in maxCommon.iterTsvRows(issnFname)]
        logging.debug("ISSNs: %s" % ",".join(issns))
        # read old pmids
        oldPmids = set([int(line.rstrip()) for line in open(pmidFname)])
        newPmids = set()
        # add new pmids, for each issn
        for issn in issns:
            if issn not in issnToPmid:
                if issn in eIssnToPIssn:
                    logging.debug("Looks like eISSN, mapped to printISSN %s" % issn)
                    issn = eIssnToPIssn[issn]
                else:
                    logging.debug("No Pmids for ISSN %s and not eIssn for it" % issn)

            issnPmids = issnToPmid.get(issn, None)
            if issnPmids==None:
                logging.debug("No Pmids for ISSN %s" % issn)
                continue
            logging.debug("Issn %s, %d PMIDs" % (issn, len(issnPmids)))
            newPmids.update(issnPmids)
        # get some counts and output to user
        oldCount = len(oldPmids)
        updateCount = len(newPmids)
        oldPmids.update(newPmids) # faster to add new to old set than old to new set
        pmids = oldPmids
        newCount = len(pmids)
        addCount = newCount - oldCount
        logging.info("crawl dir %s: old PMID count %d, update has %d, new total %d, added %d" % \
            (subdir, oldCount, updateCount, newCount, addCount))
        # write new pmids
        pmids = [str(x) for x in pmids]
        # randomize order, to distribute errors
        random.shuffle(pmids)

        # write all pmids to a tmp file
        pmidTmpFname = pmidFname+".new"
        pmidFh = open(pmidTmpFname, "w")
        pmidFh.write("\n".join(pmids))
        pmidFh.close()

        # keep a copy of the original pmid file
        shutil.copy(pmidFname, pmidFname+".bak")
        # rename  the tmp file to the original file
        # to make sure that an intact pmid file always exists
        os.rename(pmidTmpFname, pmidFname)

开发者ID:neoyukito，项目名称:pubMunch，代码行数:60，代码来源:pubUpdatePmids.py

示例3: iterArticleDataDir

def iterArticleDataDir(textDir, type="articles", filterFname=None, updateIds=None):
    """ yields all articleData from all files in textDir 
        Can filter to yield only a set of filenames or files for a 
        given list of updateIds.
    """
    fcount = 0
    if type=="articles":
        baseMask = "*.articles.gz"
    elif type=="files":
        baseMask = "*.files.gz"
    elif type=="annots":
        baseMask = "*.tab.gz"
    else:
        logging.error("Article type %s not valid" % type)
        sys.exit(1)
        
    if isfile(textDir):
        fileNames = [textDir]
        logging.debug("Found 1 file, %s" % textDir)
    else:
        fileMask = os.path.join(textDir, baseMask)
        fileNames = glob.glob(fileMask)
        logging.debug("Looking for all fulltext files in %s, found %d files" % \
            (fileMask, len(fileNames)))
        if updateIds!=None and len(updateIds)!=0:
            logging.debug("Restricting fulltext files to updateIds %s" % str(updateIds))
            filteredFiles = []
            for updateId in updateIds:
                for fname in fileNames:
                    if basename(fname).startswith(str(updateId)+"_"):
                        filteredFiles.append(fname)
                logging.debug("Update Id %s, %d files" % (str(updateId), len(filteredFiles)))
            fileNames = list(filteredFiles)

        logging.debug("Found %d files in input dir %s" % (len(fileNames), textDir))

    pm = maxCommon.ProgressMeter(len(fileNames), stepCount=100)
    for textFname in fileNames:
        if filterFname!=None and not filterFname in textFname:
            logging.warn("Skipping %s, because file filter is set" % textFname)
            continue
        reader = PubReaderFile(textFname)
        logging.debug("Reading %s, %d files left" % (textFname, len(fileNames)-fcount))
        fcount+=1
        if type=="articles":
            for articleData in reader.articleRows:
                if "publisher" not in articleData._fields: # XX temporary bugfix as I have some old files
                    articleData = list(articleData)
                    articleData.insert(2, "")
                    articleData[3] = ""
                yield articleData
        elif type=="files":
            for fileData in reader.fileRows:
                yield fileData
        elif type=="annots":
            for row in maxCommon.iterTsvRows(textFname):
                yield row
        else:
            assert(False) # illegal type parameter
        pm.taskCompleted()

开发者ID:neoyukito，项目名称:pubMunch，代码行数:60，代码来源:pubStore.py

示例4: parseHighwire

def parseHighwire():
    """ create two dicts 
    printIssn -> url to pmidlookup-cgi of highwire 
    and 
    publisherName -> top-level hostnames
    >>> temps, domains = parseHighwire()
    >>> temps['0270-6474']
    u'http://www.jneurosci.org/cgi/pmidlookup?view=long&pmid=%(pmid)s'
    >>> domains["Society for Neuroscience"]
    set([u'jneurosci.org'])
    >>> domains["American Society for Biochemistry and Molecular Biology"]
    set([u'jbc.org', u'mcponline.org', u'jlr.org'])
    >>> temps["1535-9476"]
    u'http://www.mcponline.org/cgi/pmidlookup?view=long&pmid=%(pmid)s'
    """
    templates = {}
    domains = {}
    pubFname = pubConf.publisherIssnTable
    logging.info("Parsing %s to find highwire ISSNs/webservers" % pubFname)
    for row in maxCommon.iterTsvRows(pubFname):
        if not row.pubName.startswith("HIGHWIRE"):
            continue
        pubName = row.pubName.replace("HIGHWIRE ","")
        issns = [i.strip() for i in row.journalIssns.split("|")]
        servers = row.webservers.split("|")
        for issn, server in zip(issns, servers):
            template = "http://www."+server+"/cgi/pmidlookup?view=long&pmid=%(pmid)s" 
            templates[issn] = template
            domains.setdefault(pubName, set()).add(server)
            #logging.debug("HIGHWIRE CONFIG %s, %s, %s" % (pubName, template, domains[pubName]))
    return templates, domains

开发者ID:neoyukito，项目名称:pubMunch，代码行数:31，代码来源:pubCrawlConf.py

示例5: init

    def __init__(self, fname):
        " fname can end in .articles.gz, reader will still read both articles and files "
        logging.debug("Reading data from file with prefix %s (.articles.gz, .files.gz)" % fname)
        baseDir = dirname(fname)
        base = basename(fname).split('.')[0]
        articleFn = join(baseDir, base+".articles.gz")
        fileFn = join(baseDir, base+".files.gz")
        logging.debug("Reading %s and %s" % (articleFn, fileFn))

        self.articleRows = None
        if isfile(articleFn) and getsize(articleFn)!=0:
            self.articleRows = maxCommon.iterTsvRows(articleFn, encoding="utf8")
                
        self.fileRows = None
        if isfile(fileFn) and getsize(fileFn)!=0:
            self.fileRows  = maxCommon.iterTsvRows(fileFn, encoding="utf8")

开发者ID:neoyukito，项目名称:pubMunch，代码行数:16，代码来源:pubStore.py

示例6: init

    def __init__(self, taxId):
        " open db files, compile patterns, parse input as far as possible "
        mutDataDir = pubConf.varDataDir
        geneDataDir = pubConf.geneDataDir
        if mutDataDir==None:
            return
        self.mutDataDir = mutDataDir
        self.entrez2sym, self.entrez2refprots = parseEntrez(join(geneDataDir, "entrez.tab"))

        # refseq sequences
        fname = join(mutDataDir, "seqs")
        logging.info("opening %s" % fname)
        seqs = pubKeyVal.SqliteKvDb(fname)
        self.seqs = seqs
        
        # refprot to refseqId
        # refseq to CDS Start
        fname = join(mutDataDir, "refseqInfo.tab")
        logging.debug("Reading %s" % fname)
        self.refProtToRefSeq = {}
        self.refSeqCds = {}
        for row in maxCommon.iterTsvRows(fname):
            self.refProtToRefSeq[row.refProt] = row.refSeq
            self.refSeqCds[row.refSeq] = int(row.cdsStart)-1 # NCBI is 1-based

        # refseq to genome
        self.pslCache = {}
        self.refGenePsls      = openIndexedPsls(mutDataDir, "refGenePsls.9606")

        # dbsnp db
        fname = join(self.mutDataDir, "dbSnp.sqlite")
        self.snpDb = sqlite3.connect(fname)


        logging.info("Reading of data finished")

开发者ID:neoyukito，项目名称:pubMunch，代码行数:35，代码来源:varFinder.py

示例7: readArticleChunkAssignment

def readArticleChunkAssignment(inDir, updateIds):
    "read the assignment of articleId -> chunkId from text directory"

    if updateIds == None:
        inFiles = glob.glob(os.path.join(inDir, "*_index.tab"))
    else:
        inFiles = []
        for updateId in updateIds:
            updateId = str(updateId)
            indexFname = "%s_index.tab" % updateId
            if isfile(indexFname):
                inFiles.append(os.path.join(inDir, indexFname))

    if len(inFiles) == 0:
        logging.warn("No article chunk assignment")
        return None

    logging.debug("Input files for article -> chunk assignment: %s" % inFiles)

    articleChunks = {}
    for inFile in inFiles:
        logging.info("Parsing %s" % inFile)
        for row in maxCommon.iterTsvRows(inFile):
            chunkId = int(row.chunkId.split("_")[1])
            articleChunks[int(row.articleId)] = int(chunkId)
    return articleChunks

开发者ID:joepickrell，项目名称:pubMunch，代码行数:26，代码来源:pubGeneric.py

示例8: updatePmids

def updatePmids(medlineDir, crawlDir, updateIds, minYear=None):
    """ go over subdirs of crawlDir, for each: read the ISSNs, and add new
    PMIDs we have in medlineDir to subdir/pmids.txt
    We never remove a PMID from pmids.txt.
    """ 
    logging.info("Now updating crawler directories with the new PMIDs")
    eIssnToPIssn = getEIssnToPIssn(pubConf.publisherIssnTable)
    subDirs = getSubdirs(crawlDir)
    con, cur = pubStore.openArticleDb("medline", mustOpen=True, useRamdisk=True)
    for subdir in subDirs:
        if subdir.endswith(".tmp"):
            continue
        subPath = join(crawlDir, subdir)
        logging.info("Processing subdirectory %s" % subPath)
        if isfile(pubCrawlLib.getLockFname(subPath)):
            logging.warn("Found lockfile, looks like a crawl is going on in %s, skipping" % subPath)
            continue

        pmidFname = join(crawlDir, subdir, "pmids.txt")
        issnFname = join(crawlDir, subdir, "issns.tab")
        if not isfile(issnFname) or not isfile(pmidFname):
            logging.info("Skipping %s, ISSN or docId file not found" % subPath)
            continue
        logging.debug("reading subdir %s: %s and %s" % (subdir, pmidFname, issnFname))
        issns = [row.issn.strip() for row in maxCommon.iterTsvRows(issnFname)]
        logging.debug("ISSNs: %s" % ",".join(issns))
        # read old pmids
        oldPmids = set([int(line.rstrip()) for line in open(pmidFname)])
        #newPmids = set()
        # add new pmids, for each issn
        newPmids = getPmidsForIssns(con, cur, issns, minYear)

        logging.debug("%d PMIDs" % (len(newPmids)))
        oldCount = len(oldPmids)
        updateCount = len(newPmids)
        oldPmids.update(newPmids) # faster to add new to old set than old to new set

        pmids = oldPmids
        newCount = len(pmids)
        addCount = newCount - oldCount
        logging.info("crawl dir %s: old PMID count %d, update has %d, new total %d, added %d" % \
            (subdir, oldCount, updateCount, newCount, addCount))

        # write new pmids
        pmids = [str(x) for x in pmids]
        # randomize order, to distribute errors
        random.shuffle(pmids)

        # write all pmids to a tmp file
        pmidTmpFname = pmidFname+".new"
        pmidFh = open(pmidTmpFname, "w")
        pmidFh.write("\n".join(pmids))
        pmidFh.close()

        # keep a copy of the original pmid file
        shutil.copy(pmidFname, pmidFname+".bak")
        # atomic rename  the tmp file to the original file
        # to make sure that an intact pmid file always exists
        os.rename(pmidTmpFname, pmidFname)

开发者ID:Moxikai，项目名称:pubMunch，代码行数:59，代码来源:pubUpdatePmids.py

示例9: getAllBatchIds

def getAllBatchIds(outDir):
    """ parse batches.tab and return all available batchIds
    """
    batchIds = []
    for row in maxCommon.iterTsvRows(join(outDir, "batches.tab")):
        batchIds.append(row.batchId)
    logging.debug("Found batchIds %s in directory %s" % (batchIds, outDir))
    return batchIds

开发者ID:joepickrell，项目名称:pubMunch，代码行数:8，代码来源:pubStore.py

示例10: loadTsvSqlite

def loadTsvSqlite(dbFname, tableName, tsvFnames, headers=None, intFields=[], \
    primKey=None, idxFields=[], dropTable=True):
    " load tabsep file into sqlLite db table "
    # if first parameter is string, make it to a list
    if len(tsvFnames)==0:
        logging.debug("No filenames to load")
        return
    if isinstance(tsvFnames, basestring):
        tsvFnames = [tsvFnames]
    if os.path.isfile(dbFname):
        lockDb = False
        finalDbFname = None
    else:
        lockDb = True
        finalDbFname = dbFname
        dbFname = pubGeneric.getFastUniqueTempFname()
        logging.info("writing first to db on ramdisk %s" % dbFname)
    con, cur = openSqlite(dbFname, lockDb=lockDb)

    # drop old table 
    if dropTable:
        logging.debug("dropping old sqlite table")
        cur.execute('DROP TABLE IF EXISTS %s;'% tableName)
        con.commit()

    # create table
    createSql, idxSqls = makeTableCreateStatement(tableName, headers, \
        intFields=intFields, idxFields=idxFields, primKey=primKey)
    logging.log(5, "creating table with %s" % createSql)
    cur.execute(createSql)
    con.commit()

    logging.info("Loading data into table")
    tp = maxCommon.ProgressMeter(len(tsvFnames))
    sql = "INSERT INTO %s (%s) VALUES (%s)" % (tableName, ", ".join(headers), ", ".join(["?"]*len(headers)))
    for tsvName in tsvFnames:
        logging.debug("Importing %s" % tsvName)
        if os.path.getsize(tsvName)==0:
            logging.debug("Skipping %s, zero size" % tsvName)
            continue
        rows = list(maxCommon.iterTsvRows(tsvName))
        logging.log(5, "Running Sql %s against %d rows" % (sql, len(rows)))
        cur.executemany(sql, rows)
        con.commit()
        tp.taskCompleted()

    logging.info("Adding indexes to table")
    for idxSql in idxSqls:
        cur.execute(idxSql)
        con.commit()

    con.close()

    if finalDbFname!=None:
        logging.info("moving over ramdisk db to %s" % dbFname)
        shutil.move(dbFname, finalDbFname)

开发者ID:neoyukito，项目名称:pubMunch，代码行数:56，代码来源:maxTables.py

示例11: parseDoneIds

def parseDoneIds(fname):
    " parse all already converted identifiers from inDir "
    doneIds = set()
    if os.path.getsize(fname) == 0:
        return doneIds

    for row in maxCommon.iterTsvRows(fname):
        doneIds.add(row.doi)
    logging.info("Found %d identifiers of already parsed files" % len(doneIds))
    return doneIds

开发者ID:maximilianh，项目名称:pubMunch，代码行数:10，代码来源:pubConvSpringer.py

示例12: convertOneChunk

def convertOneChunk(inIndexFile, outFile):
    """ 
    get files from inIndexFile, parse Xml, 
    write everything to outfile in ascii format
    """ 
    store = pubStore.PubWriterFile(outFile)

    i = 0
    inRows = list(maxCommon.iterTsvRows(inIndexFile))
    doi2pmid = None
    logging.info("Converting %d files" % len(inRows))
    convCount = 0
    for row in inRows:
        # read line
        i+=1
        articleId, baseDir = row.articleId, row.baseDir
        zipFilename, filename = row.zipFilename, row.filename
        articleId=int(articleId)

        # open file from zipfile
        fullZipPath = join(baseDir, zipFilename)
        zipFile = zipfile.ZipFile(fullZipPath)
        logging.debug("Parsing %s, file %s, %d files left" % (fullZipPath, filename, len(inRows)-i))
        if doi2pmid==None:
            doi2pmid = parseDoi2Pmid(baseDir)
        xmlString = zipFile.open(filename).read()
        xmlTree   = pubXml.etreeFromXml(xmlString)

        # parse xml
        articleData = pubStore.createEmptyArticleDict(publisher="elsevier")
        articleData = parseElsevier(xmlTree, articleData)
        if articleData==None:
            logging.warn("Parser got no data for %s" % filename)
            continue
        articleData["origFile"]="consyn://"+zipFilename+"/"+filename
        if articleData["doi"] in doi2pmid:
           articleData["pmid"] = doi2pmid[articleData["doi"]]

        pii = splitext(basename(filename))[0]
        articleData["externalId"]="PII"+pii
        articleData["fulltextUrl"]="http://www.sciencedirect.com/science/svapps/pii/"+pii

        # convert to ascii
        asciiString, mimeType = treeToAscii_Elsevier(xmlTree)
        if asciiString==None:
            logging.warn("No ASCII for %s / %s" % (zipFilename, filename))
            continue
        store.writeArticle(articleId, articleData)

        # write to output
        fileData = createFileData(articleData, mimeType, asciiString)
        store.writeFile(articleId, (1000*(articleId))+1, fileData, externalId=articleData["externalId"])
        convCount += 1
    logging.info("Converted %d files" % convCount)
    store.close()

开发者ID:joepickrell，项目名称:pubMunch，代码行数:55，代码来源:pubConvElsevier.py

示例13: convertOneChunk

def convertOneChunk(gzDir, idFname, inIndexFile, outFile):
    # for each row in index:
    store = pubStore.PubWriterFile(outFile)
    donePiis = pubGeneric.parseDoneIds(idFname)

    # log to file
    outBase = join(dirname(outFile), basename(outFile).split(".")[0])
    logFname = outBase+".log"
    pubGeneric.setupLogging(__file__, None, logFileName=logFname)

    idFname = outBase+"_ids.tab"
    logging.debug("Writing ids to %s" % idFname)
    idFh = open(idFname, "w")
    idFh.write("#articleId\texternalId\n")

    lastTsvFname = None
    tsvFile = None
    pmidFinder = pubCompare.PmidFinder()
    for row in maxCommon.iterTsvRows(inIndexFile, encoding=None):
        # open file and seek, if necessry
        if tsvFile==None or lastTsvFname!=row.tsvFile:
            logging.debug("Seeking to %s in tsvfile %s" % (row.offset, row.tsvFile))
            tsvFile = gzip.open(join(gzDir, row.tsvFile))
            tsvFile.seek(int(row.offset))
        lastTsvFname = row.tsvFile

        line = tsvFile.readline()

        if row.url.startswith("!"):
            logging.info("Ignoring %s, marked as duplicated" % row.url)
            continue
        #fields are: ["articleId", "tsvFile", "url", "offset"]
        fields = line.split("\t")
        url = fields[0]
        logging.debug("Replacing weird bing chars")
        content = fields[-1]
        assert(url==row.url)
        assert(len(content)!=0)
        url = url.decode("utf8")

        logging.debug("Converting to text")
        content = convertMicrosoft(content)
        artDict, fileDict = convertHtmlToDicts(url, content)
        if artDict==None:
            artDict, fileDict = minimalHtmlToDicts(url, content)
        if artDict==None:
            continue
        artDict["pmid"]  = pmidFinder.lookupPmid(artDict)
        # write file
        articleId = int(row.articleId)
        fileId = articleId*1000
        store.writeFile(articleId, fileId, fileDict)
        store.writeArticle(articleId, artDict)
    store.close()

开发者ID:maximilianh，项目名称:pubMunch，代码行数:54，代码来源:pubConvBing.py

示例14: parseHighwire

def parseHighwire():
    """ create two dicts 
    printIssn -> url to pmidlookup-cgi of highwire 
    and 
    publisherName -> top-level hostnames
    >>> temps, domains = parseHighwire()
    >>> temps['0270-6474']
    u'http://www.jneurosci.org/cgi/pmidlookup?view=long&pmid=%(pmid)s'
    >>> domains["Society for Neuroscience"]
    set([u'jneurosci'])
    """
    # highwire's publisher names are not resolved ("SAGE", "SAGE Pub", etc)
    # so: first get dict printIssn -> resolved publisherName from publishers.tab
    pubFname = join(pubConf.publisherDir, "publishers.tab")
    pIssnToPub = {}
    for row in maxCommon.iterTsvRows(pubFname):
        if not row.pubName.startswith("HIGHWIRE"):
            continue
        for issn in row.journalIssns.split("|"):
            issn = issn.rstrip(" ")
            pIssnToPub[issn] = row.pubName.replace("HIGHWIRE ","").strip()

    # go over highwire table and make dict pubName -> issn -> templates
    # and dict pubName -> domains
    fname = join(pubConf.journalListDir, "highwire.tab")
    templates = {}
    domains = {}
    for row in maxCommon.iterTsvRows(fname, encoding="latin1"):
        if row.eIssn.strip()=="Unknown":
            continue
        pubName = pIssnToPub[row.pIssn.strip()].strip()
        templates.setdefault(pubName, {})
        templates[row.pIssn.strip()] = row.urls.strip()+"/cgi/pmidlookup?view=long&pmid=%(pmid)s" 

        host = urlparse.urlparse(row.urls).hostname
        domain = ".".join(host.split('.')[-2:]).strip()
        domains.setdefault(pubName, set()).add(domain)

    return templates, domains

开发者ID:floe，项目名称:pubMunch，代码行数:39，代码来源:pubCrawlConf.py

示例15: startup

def startup(paramDict):
    global geneIds
    fname = join(dirname(__file__), "data", "wormFinder", "wormIds.tab.gz")
    geneCount = 0
    for row in maxCommon.iterTsvRows(fname):
        if row.locus!="":
            geneIds[row.locus] = row.geneId
        if row.seqId!="":
            geneIds[row.seqId] = row.geneId
        geneCount +=1
        #if row.geneId!="":
            #geneIds[row.geneId] = row.geneId
    logging.info("Loaded %d words mapped to %d genes" % (len(geneIds), geneCount))

开发者ID:Moxikai，项目名称:pubMunch，代码行数:13，代码来源:wormFinder.py

示例16: getEIssnToPIssn

def getEIssnToPIssn(journalFname):
    """ return a dict that maps from eIssn to pIssn """
    logging.info("Parsing %s to get eIssn -> pIssn mapping" % journalFname)
    ret = {}
    for row in maxCommon.iterTsvRows(journalFname):
        eStr = row.journalEIssns
        pStr = row.journalIssns
        if eStr=="" or pStr=="":
            continue
        eIssns = eStr.split("|")
        pIssns = pStr.split("|")
        assert(len(eIssns)==len(pIssns))
        for eIs, pIs in zip(eIssns, pIssns):
            if eIs!="" and pIs!="":
                ret[eIs] = pIs
    return ret

开发者ID:Moxikai，项目名称:pubMunch，代码行数:16，代码来源:pubUpdatePmids.py

示例17: getAllUpdateIds

def getAllUpdateIds(datasets):
    " collect all available text dataset updateIds for all datasets "
    textUpdateIds = {}
    for dataset in datasets:
        textDir = pubConf.resolveTextDir(dataset)
        updateFname = join(textDir, "updates.tab")
        logging.debug("Reading %s" % updateFname)
        updateIds = []
        for row in maxCommon.iterTsvRows(updateFname):
            updateIds.append(row.updateId)
        textUpdateIds[dataset] = updateIds
    return textUpdateIds

    # also save to file, so we don't have to do this again
    outFname = join(batchDir, "updateIds.json")
    json.dumps(textUpdateIds, open(outFname, "w"), sort_keys=True, indent=4, )
    return textUpdateIds

开发者ID:Moxikai，项目名称:pubMunch，代码行数:17，代码来源:pubMapProp.py

示例18: parseEntrez

def parseEntrez(fname):
    """ parse a tab-sep table with headers and return one dict with entrez to refprots
    and another dict with entrez to symbol
    """
    entrez2Sym = dict()
    entrez2RefseqProts = dict()

    for row in maxCommon.iterTsvRows(fname):
        entrez2Sym[int(row.entrezId)] = row.sym
        #refseqs = row.refseqIds.split(",")
        if row.refseqProtIds=="":
            refProts = None
        else:
            refProts = row.refseqProtIds.split(",")
            #assert(len(refProts)==len(refseqs))

        entrez2RefseqProts[int(row.entrezId)] = refProts
    return entrez2Sym, entrez2RefseqProts

开发者ID:neoyukito，项目名称:pubMunch，代码行数:18，代码来源:varFinder.py

示例19: runProcessRow

def runProcessRow(inName, alg, paramDict, outName):
    " run the rows from inName through alg and write to outName "
    tmpFnames = []
    outFh, tmpFnames = newTempOutFile(tmpFnames, outName, alg, None)
    for row in maxCommon.iterTsvRows(inName):
        newRow = alg.processRow(row)
        if newRow!=None and len(newRow)!=[]:
            writeRow(newRow, outFh)

    if "allResults" in dir(alg):
        logging.debug("running allResults() function")
        rows = alg.allResults()
        if rows!=None:
            for row in rows:
                writeRow(row, outFh)
    outFh.close()

    moveTempToFinal(tmpFnames[0], outName)

开发者ID:maximilianh，项目名称:pubMunch，代码行数:18，代码来源:pubAlg.py

示例20: concatDois

def concatDois(inDir, outDir, outFname):
    " concat all dois of id files in inDir to outFname "
    outPath = join(outDir, outFname)
    inMask = join(inDir, "*ids.tab")
    idFnames = glob.glob(inMask)
    logging.debug("Concatting DOIs from %s to %s" % (inMask, outPath))
    dois = []
    for inFname in idFnames:
        for row in maxCommon.iterTsvRows(inFname):
            dois.append(row.doi)

    ofh = open(outPath, "w")
    ofh.write("#doi\n")
    for doi in dois:
        ofh.write("%s\n" % doi)
    ofh.close()

    return outPath

开发者ID:maximilianh，项目名称:pubMunch，代码行数:18，代码来源:pubConvSpringer.py

注：本文中的maxCommon.iterTsvRows函数示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python maxminddb.open_database函数代码示例发布时间：2022-05-27

Python base.MaxTestApp类代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13925|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10291|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4165|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4064|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3889|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3538|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3065|2022-01-22

8 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2723|2022-01-22

9 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2682|2022-05-25

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2344|2022-01-22

客服电话

电子邮件

Python maxCommon.iterTsvRows函数代码示例

示例1: parseTabPublisherFile

示例2: updatePmids

示例3: iterArticleDataDir

示例4: parseHighwire

示例5: __init__

示例6: __init__

示例7: readArticleChunkAssignment

示例8: updatePmids

示例9: getAllBatchIds

示例10: loadTsvSqlite

示例11: parseDoneIds

示例12: convertOneChunk

示例13: convertOneChunk

示例14: parseHighwire

示例15: startup

示例16: getEIssnToPIssn

示例17: getAllUpdateIds

示例18: parseEntrez

示例19: runProcessRow

示例20: concatDois

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053

示例5: init

示例6: init