Spellcheck

工具列表

  • python

    • pyspellcheck
    • textblob

工具比较

ssh://git@gitlab.matgene.net:9022/mdq/infrared-json-to-csv.git ssh://git@gitlab.matgene.net:9022/mdq/infrared-relation-extraction.git ssh://git@gitlab.matgene.net:9022/mdq/pdf-no-chinese-filter.git ssh://git@gitlab.matgene.net:9022/mdq/chemical_ner_service.git ssh://git@gitlab.matgene.net:9022/mdq/chemical-smiles-inchi.git ssh://git@gitlab.matgene.net:9022/mdq/xls-table-colorizer.git http://192.168.1.72/mdq/layout-parser http://192.168.1.72/mdq/grobid-parse-client http://192.168.1.72/mdq/jupyterhub-server-controller http://192.168.1.72/mdq/semi-conductor-servcie http://192.168.1.72/mdq/semi-conductor-servcie http://192.168.1.72/mdq/pdf-ocr-txt-tool http://192.168.1.72/mdq/chemical-phase-diagrams http://192.168.1.72/mdq/pdf_cropper http://192.168.1.72/mdq/filter-with-metadata-notebook http://192.168.1.72/mdq/Chemical-Analysis http://192.168.1.72/mdq/docs-management http://192.168.1.72/mdq/condensation-record http://192.168.1.72/mdq/infrared-quantity http://192.168.1.72/mdq/gpt-extractor http://192.168.1.72/mdq/paper-master http://192.168.1.72/mdq/mdq-demo http://192.168.1.72/mdq/ruiyang http://192.168.1.72/mdq/ziwu-raser http://192.168.1.72/mdq/ziwu-raser/ziwu-raser-web http://192.168.1.72/mdq/ziwu-raser/ziwu-service http://192.168.1.72/mdq/ziwu-raser/file-transfer http://192.168.1.72/mdq/ruiyang/ruiyang-parse-service http://192.168.1.72/mdq/ruiyang/ruiyang-digital-service http://192.168.1.72/mdq/ruiyang/ruiyangdigitalservice-web http://192.168.1.72/mdq/mdq-demo/chem-extractor-service http://192.168.1.72/mdq/paper-master/clustering http://192.168.1.72/mdq/paper-master/metadata-service http://192.168.1.72/mdq/paper-master/paper-master-web

paddleocr

参数设定

参考:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from paddleocr import PaddleOCR


OCR_TOOL = PaddleOCR(
    use_angle_cls=True,
    lang="en",
    ocr_version="PP-OCRv3",
    det_db_score_mode="slow",
    # rec_algorithm='CRNN',
    e2e_pgnet_mode="slow",
)

使用流程和参数说明

  1. 主要参数和流程:

    1. 探测 detection
    2. 文本方向 direction classification
    3. 设定语言 lang
    4. 识别 recognition
  2. 一般 paddleOCR 识别字符涉及到的流程是

Train

关于拆分数据多机器

langchain

资源

搜索引擎工具

  1. duckduckgo

    • 免费
  2. serpapi

    • 100 次/月
  3. travily

    • 1000 次/月

列表

不同 agent 用途

调用 Azure 上的 model

AzureChatModel

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import os

from langchain.chat_models import AzureChatOpenAI
from langchain.schema import HumanMessage

if __name__ == "__main__":
    os.environ["OPENAI_API_TYPE"] = "azure"
    os.environ["OPENAI_API_VERSION"] = "2023-03-15-preview"
    os.environ["OPENAI_API_BASE"] = "https://wei202305.openai.azure.com/"
    os.environ["OPENAI_API_KEY"] = "<your azure key>"

    model = AzureChatOpenAI(deployment_name="gpt-35-turbo-01")

    ret = model([HumanMessage(content="Translate this sentence from English to French. I love programming.")])
    #  AIMessage(content="J'adore programmer.", additional_kwargs={}, example=False)

必填参数

  1. deployment_name='gpt-35-turbo-01'
  2. azure_endpoint="https://wei202305.openai.azure.com/"
  3. api_version="2023-03-15-preview"
  4. api_key="…"
标准调用例子:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from langchain.schema import HumanMessage
from langchain.chat_models import AzureChatOpenAI

AzureChatOpenAI(
    azure_deployment='gpt-35-turbo-01',
    api_key='0b4dfdf1786a1f84e0f9aba1a1ce2eeadfdfd1',
    azure_endpoint='https://hello202323434.openai.azure.com',
    api_version='2023-03-15-preview').invoke([HumanMessage(content='your name?')])

Out[24]: AIMessage(content="I am an AI language model created by OpenAI called GPT-3. I don't have a personal name, but you can refer to me as ChatGPT.")

AzureChatOpenAI(
    azure_deployment='gpt-35-turbo-01',
    api_key='0b4b1786adfdfdf4e0f1ce2eea171',
    azure_endpoint='https://wedi202dfdf.openai.azure.com',
    api_version='2023-03-15-preview')([HumanMessage(content='your name?')])

Out[25]: AIMessage(content="I am an AI language model created by OpenAI and I don't have a personal name. You can call me OpenAI or ChatGPT. How can I assist you today?")
不使用 HumanMessage, 直接 invoke(str) 例子:
1
2
3
4
5
AzureChatOpenAI(
    azure_deployment='gpt-35-turbo-01',
    api_key='0b4b1786adfdfdf4e0f1ce2eea171',
    azure_endpoint='https://wedi202dfdf.openai.azure.com',
    api_version='2023-03-15-preview')('your name?')

AzureOpenAIEmbeddings

1
2
3
4
In [3]: from langchain.embeddings import AzureOpenAIEmbeddings

In [4]: AzureOpenAIEmbeddings(azure_deployment='text-embedding-ada-002', api_key='0b4b1786a1f84e0f9ab34er3dfdfce2eea171', azure_endpoint='https://hello334343.openai.azure.com/').embed_query('hell
   ...: o')

Chain 的调用

调用顺序: